ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-23 03:26:53 +08:00

Author	SHA1	Message	Date
longbingljw	4c4d434bc1	Unify MySQL configuration (#12644 ) ### What problem does this PR solve? Align MySQL defaults between docker/.env and docker/service_conf.yaml.template close #12645 ### Type of change - [x] Other (please describe):Unify MySQL configuration	2026-01-20 13:42:22 +08:00
balibabu	80612bc992	Refactor: Replace antd with shadcn (#12718 ) ### What problem does this PR solve? Refactor: Replace antd with shadcn ### Type of change - [x] Refactoring	2026-01-20 13:38:54 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
lys1313013	120648ac81	fix: inaccurate error message when uploading multiple files containing an unsupported file type (#12711 ) ### What problem does this PR solve? When uploading multiple files at once, if any of the files are of an unsupported type and the blob is not removed, it triggers a TypeError('Object of type bytes is not JSON serializable') exception. This prevents the frontend from responding properly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 12:24:54 +08:00
E.G	f367189703	fix(raptor): handle missing vector fields gracefully (#12713 ) ## Summary This PR fixes a `KeyError` crash when running RAPTOR tasks on documents that don't have the expected vector field. ## Related Issue Fixes https://github.com/infiniflow/ragflow/issues/12675 ## Problem When running RAPTOR tasks, the code assumes all chunks have the vector field `q_<size>_vec` (e.g., `q_1024_vec`). However, chunks may not have this field if: 1. They were indexed with a different embedding model (different vector size) 2. The embedding step failed silently during initial parsing 3. The document was parsed before the current embedding model was configured This caused a crash: ``` KeyError: 'q_1024_vec' ``` ## Solution Added defensive validation in `run_raptor_for_kb()`: 1. Check for vector field existence before accessing it 2. Skip chunks that don't have the required vector field instead of crashing 3. Log warnings for skipped chunks with actionable guidance 4. Provide informative error messages suggesting users re-parse documents with the current embedding model 5. Handle both scopes (`file` and `kb` modes) ## Changes - `rag/svr/task_executor.py`: Added validation and error handling in `run_raptor_for_kb()` ## Testing 1. Create a knowledge base with an embedding model 2. Parse documents 3. Change the embedding model to one with a different vector size 4. Run RAPTOR task 5. Before: Crashes with `KeyError` 6. After: Gracefully skips incompatible chunks with informative warnings --- <!-- Gittensor Contribution Tag: @GlobalStar117 --> Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>	2026-01-20 12:24:20 +08:00
writinwaters	1b1554c563	Docs: Added ingestion pipeline quickstart (#12708 ) ### What problem does this PR solve? Added ingestion pipeline quickstart ### Type of change - [x] Documentation Update	2026-01-20 09:48:32 +08:00
balibabu	59f3da2bdf	Fix: The time zone is unable to update properly in the database #12696 (#12704 ) ### What problem does this PR solve? Fix: The time zone is unable to update properly in the database #12696 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 09:47:16 +08:00
qinling0210	b40d639fdb	Add dataset with table parser type for Infinity and answer question in chat using SQL (#12541 ) ### What problem does this PR solve? 1) Create dataset using table parser for infinity 2) Answer questions in chat using SQL ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-19 19:35:14 +08:00
balibabu	05da2a5872	Fix: When large models output data rapidly, the scrollbar cannot remain at the bottom. #12701 (#12702 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 19:09:41 +08:00
qinling0210	4fbaa4aae9	Bump to infinity v0.7.0-dev1 (#12699 ) ### What problem does this PR solve? Bump to infinity v0.7.0-dev1 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-19 16:36:03 +08:00
E.G	3188cd2659	fix: Ensure pip is available in venv for runtime installation (#12667 ) ## Summary Fixes #12651 The Docker container was failing at startup with: ``` /ragflow/.venv/bin/python3: No module named pip ``` This occurred when `USE_DOCLING=true` because the `entrypoint.sh` tries to use `uv pip install` to install docling at runtime. ## Root Cause As explained in the issue: 1. `uv sync` creates a minimal, production-focused environment without pip 2. The production stage copies the venv from builder 3. Runtime commands using `uv pip install` fail because pip is not present ## Solution Added `python -m ensurepip --upgrade` after `uv sync` in the Dockerfile to ensure pip is available in the virtual environment: ```dockerfile uv sync --python 3.12 --frozen && \ # Ensure pip is available in the venv for runtime package installation (fixes #12651) .venv/bin/python3 -m ensurepip --upgrade ``` This is a minimal change that: - Ensures pip is installed during build time - Doesn't change any other behavior - Allows runtime package installation via `uv pip install` to work --- This is a Gittensor contribution. gittensor:user:GlobalStar117 Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>	2026-01-19 16:08:14 +08:00
longbingljw	c4a982e9fa	feat: add seekdb which is lite version of oceanbase (#12692 ) ### What problem does this PR solve? Add seekdb as doc_engine wich is the lite version of oceanbase. close #12691 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-19 16:07:43 +08:00
Hwwwww-dev	b27dc26be3	fix: Update answer concatenation logic to handle overlapping values (#12676 ) ### What problem does this PR solve? Update answer concatenation logic to handle overlapping values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 16:06:36 +08:00
LIRUI YU	ab1836f216	An issue involving node.js OOM happened (#12690 ) ### What problem does this PR solve? The Node.js memory issue occurred due to JavaScript heap exhaustion during the Vite build process sometimes. Here's what happened: export NODE_OPTIONS="--max-old-space-size=4096" && \ Root Cause: The Node.js memory issue occurred due to JavaScript heap exhaustion during the Vite build process sometimes. Here's what happened: Root Cause: When building the web frontend with npm run build, Vite needs to bundle, transform, and optimize all JavaScript/TypeScript code Node.js has a default maximum heap size of ~2GB The RAGFlow web application is large enough that the build process exceeded this limit This triggered garbage collection failures ("Ineffective mark-compacts near heap limit") and eventually crashed with exit code 134 (SIGABRT) The solution I attempted: I did not find a simple method to reduce the use of memory for node.js, so I added NODE_OPTIONS=--max-old-space-size=4096 to allocate 4GB heap memory for Node.js during the build. ### Type of change - Bug Fix (non-breaking change which fixes an issue) => ERROR [builder 6/8] RUN --mount=type=cache,id=ragflow_npm,target=/ro 53.3s [builder 6/8] RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked cd web && npm install && npm run build: 4.551 4.551 > prepare 4.551 > cd .. && husky web/.husky 4.551 4.810 .git can't be found 4.833 added 7 packages in 4s 4.833 4.833 499 packages are looking for funding 4.833 run npm fund for details 5.206 5.206 > build 5.206 > vite build --mode production 5.206 5.939 vite v7.3.0 building client environment for production... 6.169 transforming... 6.472 6.472 WARN 6.472 6.472 6.472 WARN warn - As of Tailwind CSS v3.3, the @tailwindcss/line-clamp plugin is now included by default. 6.472 6.472 6.472 WARN warn - Remove it from the plugins array in your configuration to eliminate this warning. 6.472 53.14 53.14 <--- Last few GCs ---> 53.14 53.14 [41:0x55f82d0] 47673 ms: Scavenge (reduce) 2041.5 (2086.0) -> 2038.7 (2079.7) MB, 6.11 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 [41:0x55f82d0] 47727 ms: Scavenge (reduce) 2039.4 (2079.7) -> 2038.7 (2080.2) MB, 5.34 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 [41:0x55f82d0] 47809 ms: Scavenge (reduce) 2039.6 (2080.2) -> 2038.7 (2080.2) MB, 4.59 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 53.14 53.14 <--- JS stacktrace ---> 53.14 53.14 FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 53.14 ----- Native stack trace ----- 53.14 53.14 1: 0xb76db1 node::OOMErrorHandler(char const, v8::OOMDetails const&) [node] 53.14 2: 0xee62f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [node] 53.14 3: 0xee65d7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [node] 53.14 4: 0x10f82d5 [node] 53.14 5: 0x10f8864 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node] 53.14 6: 0x110f754 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const) [node] 53.14 7: 0x110ff6c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node] 53.14 8: 0x11120ca v8::internal::Heap::HandleGCRequest() [node] 53.14 9: 0x107d737 v8::internal::StackGuard::HandleInterrupts() [node] 53.15 10: 0x151fb9a v8::internal::Runtime_StackGuard(int, unsigned long, v8::internal::Isolate) [node] 53.15 11: 0x1959ef6 [node] 53.22 Aborted [+] up 0/1 ⠙ Image docker-ragflow Building 58.0s Dockerfile:161 160 \| COPY docs docs 161 \| >>> RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \ 162 \| >>> cd web && npm install && npm run build 163 \| failed to solve: process "/bin/bash -c cd web && npm install && npm run build" did not complete successfully: exit code: 134 View build details: docker-desktop://dashboard/build/default/default/j68n2ke32cd8bte4y8fs471au	2026-01-19 14:28:38 +08:00
Loganaden Velvindron	7a53d2dd97	Fix CVE-2025-59466 (#12679 ) ### What problem does this PR solve? https://nodejs.org/en/blog/vulnerability/january-2026-dos-mitigation-async-hooks ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 13:15:15 +08:00
n1n.ai	f3d347f55f	feat: Add n1n provider (#12680 ) This PR adds n1n as an LLM provider to RAGFlow. Co-authored-by: Qun <qun@ip-10-5-5-38.us-west-2.compute.internal>	2026-01-19 13:12:42 +08:00
E.G	9da48ab0bd	fix: Handle NaN/Infinity values in ExeSQL JSON response (#12666 ) ## Summary Fixes #12631 When SQL query results contain NaN (Not a Number) or Infinity values (e.g., from division by zero or other calculations), the JSON serialization would fail because NaN and Infinity are not valid JSON values. This caused the agent interface to show 'undefined' error, as described in the issue where `EXAMINE_TIMES` became `NaN` and broke the JSON parsing. ## Root Cause The `convert_decimals` function in `exesql.py` was only handling `Decimal` types, but not `float` values that could be `NaN` or `Infinity`. When these invalid JSON values were serialized: ```json {"EXAMINE_TIMES": NaN} // Invalid JSON! ``` The frontend JSON parser would fail, causing the 'undefined' error. ## Solution Extended `convert_decimals` to detect `float` values and convert `NaN`/`Infinity` to `null` before JSON serialization: ```python if isinstance(obj, float): if math.isnan(obj) or math.isinf(obj): return None return obj ``` This ensures all SQL results can be properly serialized to valid JSON. --- This is a Gittensor contribution. gittensor:user:GlobalStar117 Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-01-19 12:46:06 +08:00
Stephen Hu	4a7e40630b	Refactor:memory delete will re-use super method (#12684 ) ### What problem does this PR solve? memory delete will re-use super method ### Type of change - [x] Refactoring	2026-01-19 12:45:37 +08:00
Jin Hai	d6897b6054	Fix chat error (#12693 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-19 12:45:14 +08:00
qinling0210	828ae1e82f	Round float value of minimum_should_match (#12688 ) ### What problem does this PR solve? In paragraph() of class FulltextQueryer, "len(keywords) / 10" should be rounded to integer before set to minimum_should_match. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 11:39:33 +08:00
francisye19	57d189b483	fix: Correct gitlab_url access in sync_data_source.py (#12681 ) ### What problem does this PR solve? Correct gitlab_url access. See https://github.com/infiniflow/ragflow/blob/main/web/src/pages/user-setting/data-source/constant/index.tsx#L660-L666 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 11:01:34 +08:00
Mohan	0a8eb11c3d	fix: Add proper error handling for database reconnection attempts (#12650 ) ## Problem When database connection is lost, the reconnection logic had a bug: if the first reconnect attempt failed, the second attempt was not wrapped in error handling, causing unhandled exceptions. ## Solution Added proper try-except blocks around the second reconnect attempt in both MySQL and PostgreSQL database classes to ensure errors are properly logged and handled. ## Changes - Fixed `_handle_connection_loss()` in `RetryingPooledMySQLDatabase` - Fixed `_handle_connection_loss()` in `RetryingPooledPostgresqlDatabase` Fixes #12294 --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=158349177 Co-authored-by: SID <158349177+0xsid0703@users.noreply.github.com>	2026-01-19 09:48:10 +08:00
Jin Hai	38f0a92da9	Use RAGFlow CLI to replace RAGFlow Admin CLI (#12653 ) ### What problem does this PR solve? ``` $ python admin/client/ragflow_cli.py -t user -u aaa@aaa.com -p 9380 ragflow> list datasets; ragflow> list default models; ragflow> show version; ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-17 17:52:38 +08:00
writinwaters	067ddcbf23	Docs: Added configure memory (#12665 ) ### What problem does this PR solve? As title. ### Type of change - [x] Documentation Update	2026-01-17 17:49:19 +08:00
Hetavi Shah	46305ef35e	Add User API Token Management to Admin API and CLI (#12595 ) ## Summary This PR extends the RAGFlow Admin API and CLI with comprehensive user API token management capabilities. Administrators can now generate, list, and delete API tokens for users through both the REST API and the Admin CLI interface. ## Changes ### Backend API (`admin/server/`) #### New Endpoints - POST `/api/v1/admin/users/<username>/new_token` - Generate a new API token for a user - GET `/api/v1/admin/users/<username>/token_list` - List all API tokens for a user - DELETE `/api/v1/admin/users/<username>/token/<token>` - Delete a specific API token for a user #### Service Layer Updates (`services.py`) - Added `get_user_api_key(username)` - Retrieves all API tokens for a user - Added `save_api_token(api_token)` - Saves a new API token to the database - Added `delete_api_token(username, token)` - Deletes an API token for a user ### Admin CLI (`admin/client/`) #### New Commands - `GENERATE TOKEN FOR USER <username>;` - Generate a new API token for the specified user - `LIST TOKENS OF <username>;` - List all API tokens associated with a user - `DROP TOKEN <token> OF <username>;` - Delete a specific API token for a user ### Testing Added comprehensive test suite in `test/testcases/test_admin_api/`: - `test_generate_user_api_key.py` - Tests for API token generation - `test_get_user_api_key.py` - Tests for listing user API tokens - `test_delete_user_api_key.py` - Tests for deleting API tokens - `conftest.py` - Shared test fixtures and utilities ## Technical Details ### Token Generation - Tokens are generated using `generate_confirmation_token()` utility - Each token includes metadata: `tenant_id`, `token`, `beta`, `create_time`, `create_date` - Tokens are associated with user tenants automatically ### Security Considerations - All endpoints require admin authentication (`@check_admin_auth`) - Tokens are URL-encoded when passed in DELETE requests to handle special characters - Proper error handling for unauthorized access and missing resources ### API Response Format All endpoints follow the standard RAGFlow response format: ```json { "code": 0, "data": {...}, "message": "Success message" } ``` ## Files Changed - `admin/client/admin_client.py` - CLI token management commands - `admin/server/routes.py` - New API endpoints - `admin/server/services.py` - Token management service methods - `docs/guides/admin/admin_cli.md` - CLI documentation updates - `test/testcases/test_admin_api/conftest.py` - Test fixtures - `test/testcases/test_admin_api/test_user_api_key_management/*` - Test suites ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Alexander Strasser <alexander.strasser@ondewo.com> Co-authored-by: Hetavi Shah <your.email@example.com>	2026-01-17 15:21:00 +08:00
He Wang	bd9163904a	fix(ob_conn): ignore duplicate errors when executing 'create_idx' (#12661 ) ### What problem does this PR solve? Skip duplicate errors to avoid 'create_idx' failures caused by slow metadata refresh or external modifications. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 20:46:37 +08:00
Kevin Hu	b6d7733058	Feat: metadata settings in KB. (#12662 ) ### What problem does this PR solve? #11910 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-16 20:14:02 +08:00
6ba3i	4f036a881d	Fix: Infinity keyword round-trip, highlight fallback, and KB update guards (#12660 ) ### What problem does this PR solve? Fixes Infinity-specific API regressions: preserves ```important_kwd``` round‑trip for ```[""]```, restores required highlight key in retrieval responses, and enforces Infinity guards for unsupported ```parser_id=tag``` and pagerank in ```/v1/kb/update```. Also removes a slow/buggy pandas row-wise apply that was throwing ```ValueError``` and causing flakiness. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 20:03:52 +08:00
6ba3i	59075a0b58	Fix : p3 level sdk test error for update chat (#12654 ) ### What problem does this PR solve? fix for update chat failing ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 17:47:12 +08:00
PentaFDevs	30bd25716b	Fix PDF Generator output variables not appearing in subsequent agent steps (#12619 ) This commit fixes multiple issues preventing PDF Generator (Docs Generator) output variables from being visible in the Output section and available to downstream nodes. ### What problem does this PR solve? Issues Fixed: 1. PDF Generator nodes initialized with empty object instead of proper initial values 2. Output structure mismatch (had 'value' property that system doesn't expect) 3. Missing 'download' output in form schema 4. Output list computed from static values instead of form state 5. Added null/undefined guard to transferOutputs function ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Changes: - web/src/pages/agent/constant/index.tsx: Fixed output structure in initialPDFGeneratorValues - web/src/pages/agent/hooks/use-add-node.ts: Initialize PDF Generator with proper values - web/src/pages/agent/form/pdf-generator-form/index.tsx: Fixed schema and use form.watch - web/src/pages/agent/form/components/output.tsx: Added null guard and spacing	2026-01-16 16:50:53 +08:00
balibabu	99dae3c64c	Fix: In the agent loop, if the await response is selected as the variable, the operator cannot be selected. #12656 (#12657 ) ### What problem does this PR solve? Fix: In the agent loop, if the await response is selected as the variable, the operator cannot be selected. #12656 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 16:49:48 +08:00
Magicbook1108	045314a1aa	Fix: duplicate content in chunk (#12655 ) ### What problem does this PR solve? Fix: duplicate content in chunk #12336 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 15:32:04 +08:00
6ba3i	2b20d0b3bb	Fix : Web API tests by normalizing errors, validation, and uploads (#12620 ) ### What problem does this PR solve? Fixes web API behavior mismatches that caused test failures by normalizing error responses, tightening validations, correcting error messages, and closing upload file handles. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 11:09:22 +08:00
zagnaan	59f4c51222	fix(entrypoint): Preserve $ in passwords during template expansion (#12509 ) ### What problem does this PR solve? Fix shell variable expansion to preserve $ in password defaults when env vars are unset. Fixes Azure RDS auto-rotated passwords (that contain $) being truncated during template processing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 19:30:33 +08:00
chanx	8c1fbfb130	Fix：Some bugs (#12648 ) ### What problem does this PR solve? Fix: Modified and optimized the metadata condition card component. Fix: Use startOfDay and endOfDay to ensure the date range includes a full day. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 19:28:22 +08:00
Kevin Hu	cec06bfb5d	Fix: empty chunk issue. (#12638 ) #12570 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 17:46:21 +08:00
writinwaters	2167e3a3c0	Docs: Added share memory (#12647 ) ### Type of change - [x] Documentation Update	2026-01-15 17:21:36 +08:00
liuxiaoyusky	2ea8dddef6	fix(infinity): Use comma separator for important_kwd to preserve mult… (#12618 ) ## Problem The \`important_kwd\` field in Infinity connector was using mismatched separators: - Storage: \`list2str(v)\` uses space as default separator - Reading: \`v.split()\` splits by all whitespace This causes multi-word keywords like \`\"Senior Fund Manager\"\` to be incorrectly split into \`[\"Senior\", \"Fund\", \"Manager\"]\`. ## Solution Use comma \`,\` as separator for both storing and reading, consistent with: 1. The LLM output format in \`keyword_prompt.md\` (\"delimited by ENGLISH COMMA\") 2. The \`cached.split(\",\")\` in \`task_executor.py\` ## Changes - \`insert()\`: \`list2str(v)\` → \`list2str(v, \",\")\` - \`update()\`: \`list2str(v)\` → \`list2str(v, \",\")\` - \`get_fields()\`: \`v.split()\` → \`v.split(\",\") if v else []\` ## Impact This bug affects: - Python-level reranking weight calculation (\`important_kwd * 5\`) - API response keyword display - Search precision due to fragmented keywords	2026-01-15 15:32:40 +08:00
longbingljw	18867daba7	chore: bump pyobvector from 0.2.18 to 0.2.22 (#12640 ) ### What problem does this PR solve? Update ob client ### Type of change - [x] Other (please describe):dependency upgrade	2026-01-15 15:21:34 +08:00
longbingljw	d68176326d	feat: add oceanbase mount to gitignore (#12642 ) ### What problem does this PR solve? feat: add oceanbase mount to .gitignore ### Type of change - [x] Refactoring	2026-01-15 15:20:40 +08:00
balibabu	d531bd4f1a	Fix: Editing the agent greeting causes the greeting to be continuously added to the message list. #12635 (#12636 ) ### What problem does this PR solve? Fix: Editing the agent greeting causes the greeting to be continuously added to the message list. #12635 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 14:55:19 +08:00
Vedant Madane	ac936005e6	fix: ensure deleted chunks are not returned in retrieval (#12520 ) (#12546 ) ## Summary Fixes #12520 - Deleted chunks should not appear in retrieval/reference results. ## Changes ### Core Fix - api/apps/chunk_app.py: Include \doc_id\ in delete condition to properly scope the delete operation ### Improved Error Handling - api/db/services/document_service.py: Better separation of concerns with individual try-catch blocks and proper logging for each cleanup operation ### Doc Store Updates - rag/utils/es_conn.py: Updated delete query construction to support compound conditions - rag/utils/opensearch_conn.py: Same updates for OpenSearch compatibility ### Tests - test/testcases/.../test_retrieval_chunks.py: Added \TestDeletedChunksNotRetrievable\ class with regression tests - test/unit/test_delete_query_construction.py: Unit tests for delete query construction ## Testing - Added regression tests that verify deleted chunks are not returned by retrieval API - Tests cover single chunk deletion and batch deletion scenarios	2026-01-15 14:45:55 +08:00
Pegasus	d8192f8f17	Fix: validate regex pattern in split_with_pattern to prevent crash (#12633 ) ### What problem does this PR solve? Fix regex pattern validation in split_with_pattern (#12605) - Add try-except block to validate user-provided regex patterns before use - Gracefully fallback to single chunk when invalid regex is provided - Prevent server crash during DOCX parsing with malformed delimiters ## Problem Parsing DOCX files with custom regex delimiters crashes with `re.error: nothing to repeat at position 9` when users provide invalid regex patterns. Closes #12605 ## Solution Validate and compile regex pattern before use. On invalid pattern, log warning and return content as single chunk instead of crashing. ## Changes - `rag/nlp/__init__.py`: Add regex validation in `split_with_pattern()` function ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461	2026-01-15 14:24:51 +08:00
Kevin Hu	eb35e2b89f	Fix: async invocation isssue. (#12634 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 14:22:16 +08:00
MkDev11	97b983fd0b	fix: add fallback parser list for empty parser_ids (#12632 ) ### What problem does this PR solve? Fixes #12570 - The slicing method dropdown was empty when deploying RAGFlow v0.23.1 from source code. The issue occurred because `parser_ids` from the tenant info was empty or undefined, causing `useSelectParserList` to return an empty array. This PR adds a fallback to a default parser list when `parser_ids` is empty, ensuring the dropdown always has options. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=94194147	2026-01-15 14:05:25 +08:00
Magicbook1108	b40a7b2e7d	Feat: Hash doc id to avoid duplicate name. (#12573 ) ### What problem does this PR solve? Feat: Hash doc id to avoid duplicate name. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-15 14:02:15 +08:00
Kevin Hu	9a10558f80	Refa: async retrieval process. (#12629 ) ### Type of change - [x] Refactoring - [x] Performance Improvement	2026-01-15 12:28:49 +08:00
SID	f82628c40c	Fix: langfuse connection error handling #12621 (#12626 ) ## Description Fixes connection error handling when langfuse service is unavailable. The application now gracefully handles connection failures instead of crashing. ## Changes - Wrapped `langfuse.auth_check()` calls in try-except blocks in: - `api/db/services/dialog_service.py` - `api/db/services/tenant_llm_service.py` ## Problem When langfuse service is unavailable or connection is refused, `langfuse.auth_check()` throws `httpx.ConnectError: [Errno 111] Connection refused`, causing the application to crash during document parsing or dialog operations. ## Solution Added try-except blocks around `langfuse.auth_check()` calls to catch connection errors and gracefully skip langfuse tracing instead of crashing. The application continues functioning normally even when langfuse is unavailable. ## Related Issue Fixes #12621 --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=158349177	2026-01-15 11:23:15 +08:00
chanx	7af98328f5	Fix: the styles of the multi-select component and the filter pop-up. (#12628 ) ### What problem does this PR solve? Fix: Fix the styles of the multi-select component and the filter pop-up. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 10:53:18 +08:00
MkDev11	678a4f959c	Fix: skip internal bookmark references in DOCX parsing (#12604 ) (#12611 ) ### What problem does this PR solve? Fixes #12604 - DOCX files containing hyperlinks to internal bookmarks (e.g., `#_文档目录`) cause a `KeyError` during parsing: ``` KeyError: "There is no item named 'word/#_文档目录' in the archive" ``` This happens because python-docx incorrectly tries to read internal bookmark references as files from the ZIP archive. Internal bookmarks are relationship targets starting with `#` and are not actual files. This PR extends the existing `load_from_xml_v2` workaround (which already handles `NULL` targets) to also skip relationship targets starting with `#`. Related upstream issue: https://github.com/python-openxml/python-docx/issues/902 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=94194147	2026-01-14 19:08:46 +08:00

1 2 3 4 5 ...

5093 Commits