ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-27 13:46:39 +08:00

Author	SHA1	Message	Date
Yongteng Lei	13076bb87b	Fix: Parent chunking fails on DOCX files (#12822 ) ### What problem does this PR solve? Fixes parent chunking fails on DOCX files. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-26 17:55:09 +08:00
balibabu	e04cd99ae2	Feat: Add the history field to the agent's system variables. #7322 (#12823 ) ### What problem does this PR solve? Feat: Add the history field to the agent's system variables. #7322 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-26 17:54:30 +08:00
Jin Hai	41905e2569	Update RAGFlow CLI (#12816 ) ### What problem does this PR solve? Improve performance slightly. ### Type of change - [x] Refactoring - [x] Performance Improvement Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-26 12:58:04 +08:00
Stephen Hu	0782a7d3c6	Refactor: improve task cancellation checks in RAPTOR (#12813 ) ### What problem does this PR solve? Introduced a helper method _check_task_canceled to centralize and simplify task cancellation checks throughout RecursiveAbstractiveProcessing4TreeOrganizedRetrieval. This reduces code duplication and improves maintainability. ### Type of change - [x] Refactoring	2026-01-26 11:34:54 +08:00
LIRUI YU	4236a62855	Fix: Cancel tasks before document or datasets deletion to prevent queue blocking (#12799 ) ### What problem does this PR solve? When deleting the knowledge base, the records in the Document and Knowledgebase tables are immediately deleted But there are still a large number of pending task messages in the Redis queue (asynchronous queue) if you did not click on stopping tasks before deleting knowledge base. TaskService.get_task() uses a JOIN query to associate three tables (Task ← Document ← Knowledgebase) Since Document/Knowledgebase have been deleted, the JOIN returns an empty result, even though the Task records still exist task-executor considers the task does not exist ("collect task xxx is unknown"), can only skip and warn log：2026-01-23 16:43:21,716 WARNING 1190179 collect task 110fbf70f5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:21,818 WARNING 1190179 collect task 11146bc4f5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:21,918 WARNING 1190179 collect task 111c3336f5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:22,021 WARNING 1190179 collect task 112471b8f5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:26,719 WARNING 1190179 collect task 112e855ef5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:26,734 WARNING 1190179 collect task 1134380af5bd11f0945a23b0930487df is unknown 2026-01-23 16:43:26,834 WARNING 1190179 collect task 1138cb2cf5bd11f0945a23b0930487df is unknown As a consequence, a large number of such tasks occupy the queue processing capacity, causing new tasks to queue and wait <img width="1910" height="947" alt="9a00f2e0-9112-4dbb-b357-7f66b8eb5acf" src="https://github.com/user-attachments/assets/0e1227c2-a2df-4ef3-ba8f-e04c3f6ef0e1" /> Solution Add logic to stop all ongoing tasks before deleting the knowledge base and Tasks ### Type of change - Bug Fix (non-breaking change which fixes an issue)	2026-01-26 10:45:59 +08:00
Da22wei	9afb5bc136	Add Copilot setting and conventions (#12807 ) ### What problem does this PR solve? Added project instructions for setting up and running the application. ### Type of change - [x] Documentation Update	2026-01-26 10:44:20 +08:00
Kevin Hu	f0fcf8aa9a	Fix: reset conversation variables. (#12814 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-26 10:43:57 +08:00
Jin Hai	274fc5ffaa	Fix RAGFlow CLI bug (#12811 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-25 23:08:59 +08:00
writinwaters	80a16e71df	Docs: Added webhook specific configuration tips (#12802 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2026-01-23 22:09:49 +08:00
balibabu	6220906164	Fix: Fixed the error on the login page. (#12801 ) ### What problem does this PR solve? Fix: Fixed the error on the login page. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 18:58:54 +08:00
Jimmy Ben Klieve	fa5284361c	feat: support admin assign superuser in admin ui (#12798 ) ### What problem does this PR solve? Allow superuser(admin) to grant or revoke other superuser. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-23 18:08:46 +08:00
Lynn	f3923452df	Fix: add tokenized content (#12793 ) ### What problem does this PR solve? Add tokenized content es field to query zh message. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 16:56:03 +08:00
chanx	11470906cf	Fix: Metadata time Picker (#12796 ) ### What problem does this PR solve? Fix: Metadata time Picker ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 16:55:43 +08:00
Jin Hai	e1df82946e	RAGFlow CLI: ping server before input password when login user (#12791 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-23 15:03:05 +08:00
Kevin Hu	08c01b76d5	Fix: missing parent chunk issue. (#12789 ) ### What problem does this PR solve? Close #12783 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 12:54:08 +08:00
apps-lycusinc	678392c040	feat(deepdoc): add configurable ONNX thread counts and GPU memory shrinkage (#12777 ) ### What problem does this PR solve? This PR addresses critical memory and CPU resource management issues in high-concurrency environments (multi-worker setups): GPU Memory Exhaustion (OOM): Currently, onnxruntime-gpu uses an aggressive memory arena that does not effectively release VRAM back to the system after a task completes. In multi-process worker setups ($WS > 4), this leads to BFCArena allocation failures and OOM errors as workers "hoard" VRAM even when idle. This PR introduces an optional GPU Memory Arena Shrinkage toggle to mitigate this issue. CPU Oversubscription: ONNX intra_op and inter_op thread counts are currently hardcoded to 2. When running many workers, this causes significant CPU context-switching overhead and degrades performance. This PR makes these values configurable to match the host's actual CPU core density. Multi-GPU Support: The memory management logic has been improved to dynamically target the correct device_id, ensuring stability on systems with multiple GPUs. Transparency: Added detailed initialization logs to help administrators verify and troubleshoot their ONNX session configurations. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: shakeel <shakeel@lollylaw.com>	2026-01-23 11:36:28 +08:00
Julien Deveaux	6be197cbb6	Fix: Use tiktoken for proper token counting in OpenAI-compatible endpoint #7850 (#12760 ) ### What problem does this PR solve? The OpenAI-compatible chat endpoint (`/chats_openai/<chat_id>/chat/completions`) was not returning accurate token usage in streaming responses. The token counts were either missing or inaccurate because the underlying LLM API responses weren't being properly parsed for usage data. This PR adds proper token counting using tiktoken (cl100k_base encoding) as a fallback when the LLM API doesn't provide usage data in streaming chunks. This ensures clients always receive token usage information in the response, which is essential for billing and quota management. Changes: - Add tiktoken-based token counting for streaming responses in OpenAI-compatible endpoint - Ensure `usage` field is always populated in the final streaming chunk - Add unit tests for token usage calculation Fixes #7850 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 09:36:21 +08:00
balibabu	8dd4a41bf8	Feat: Add a web search button to the chat box on the chat page. (#12786 ) ### What problem does this PR solve? Feat: Add a web search button to the chat box on the chat page. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-23 09:33:50 +08:00
chanx	e9453a3971	Fix: Metadata supports precise time selection (#12785 ) ### What problem does this PR solve? Fix: Metadata supports precise time selection ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 09:33:34 +08:00
balibabu	7c9b6e032b	Fix: The minimum size of the historical message window for the classification operator is 1. #12778 (#12779 ) ### What problem does this PR solve? Fix: The minimum size of the historical message window for the classification operator is 1. #12778 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-22 19:45:25 +08:00
Kevin Hu	3beb85efa0	Feat: enhance metadata arranging. (#12745 ) ### What problem does this PR solve? #11564 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-22 15:34:08 +08:00
LIRUI YU	bc7b864a6c	top_k parameter ignored, always returned page_size results (#12753 ) ### What problem does this PR solve? Backend \rag\nlp\search.py Before the fix The top_k parameter was not applied to limit the total number of chunks, and the rerank model also uses the exact whole valid_idx rather than assigning valid_idx = valid_idx[:top] firstly. After the fix The top_k limit is applied to the total results before pagination, using a default value of top = 1024 if top_k is not modified. session.py Before the fix: When the frontend calls the retrieval API with `search_id`, the backend only reads `meta_data_filter` from the saved `search_config`. The `rerank_id`, `top_k`, `similarity_threshold`, and `vector_similarity_weight` parameters are only taken from the direct request body. Since the frontend doesn't pass these parameters explicitly (it only passes `search_id`), they always fall back to default values: - `similarity_threshold` = 0.0 - `vector_similarity_weight` = 0.3 - `top_k` = 1024 - `rerank_id` = "" (no rerank) This means user settings saved in the Search Settings page have no effect on actual search results. After the fix: When a `search_id` is provided, the backend now reads all relevant configuration from the saved `search_config`, including `rerank_id`, `top_k`, `similarity_threshold`, and `vector_similarity_weight`. Request parameters can still override these values if explicitly provided, allowing flexibility. The rerank model is now properly instantiated using the configured `rerank_id`, making the rerank feature actually work. Frontend \web\src\pages\next-search\search-setting.tsx Before the fix search-setting.tsx file, the top_k input box is only displayed when rerank is enabled (wrapped in the rerankModelDisabled condition). If the rerank switch is turned off, the top_k input field will be hidden, but the form value will remain unchanged. In other words: - When rerank is enabled, users can modify top_k (default 1024). - When rerank is disabled, top_k retains the previous value, but it's not visible on the interface. Therefore, the backend will always receive the top_k parameter; it's just that the frontend UI binds this configuration item to the rerank switch. When rerank is turned off, top_k will not automatically reset to 1024, but will retain its original value. After the fix On the contrary, if we switch off the button rerank model, the value top-k will be reset to 1024. By the way, If we use top-k in an individual method, rather than put it into the method retrieval, we can control it separately Now all methods valid Using rerank <img width="2378" height="1565" alt="Screenshot 2026-01-21 190206" src="https://github.com/user-attachments/assets/fa2b0df0-1334-4ca3-b169-da6c5fd59935" /> Not using rerank <img width="2596" height="1559" alt="Screenshot 2026-01-21 190229" src="https://github.com/user-attachments/assets/c5a80522-a0e1-40e7-b349-42fe86df3138" /> Before fixing they are the same ### Type of change - Bug Fix (non-breaking change which fixes an issue)	2026-01-22 15:33:42 +08:00
zhanxin.xu	93091f4551	[Feat]Automatic table orientation detection and correction (#12719 ) ### What problem does this PR solve? This PR introduces automatic table orientation detection and correction within the PDF parser. This ensures that tables in PDFs are correctly oriented before structure recognition, improving overall parsing accuracy. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-01-22 12:47:55 +08:00
会敲代码的喵	2d9e7b4acd	Fix: aliyun oss need to use s3 signature_version (#12766 ) ### What problem does this PR solve? Aliyun OSS do not support boto s4 signature_version which will lead to an error: ``` botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutObject operation: aws-chunked encoding is not supported with the specified x-amz-content-sha256 value ``` According to aliyun oss docs, oss_conn need to use s3 signature_version. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-22 11:43:55 +08:00
天海蒼灆	6f3f69b62e	Feat: API adds audio to text and text to speech functions (#12764 ) ### What problem does this PR solve? API adds audio to text and text to speech functions ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-22 11:20:26 +08:00
chanx	bfd5435087	Fix: After deleting metadata in batches, the selected items need to be cleared. (#12767 ) ### What problem does this PR solve? Fix: After deleting metadata in batches, the selected items need to be cleared. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-22 11:20:11 +08:00
balibabu	0e9fe68110	Feat: Adjust the icons in the chat page's collapsible panel. (#12755 ) ### What problem does this PR solve? Feat: Adjust the icons in the chat page's collapsible panel. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-22 09:48:44 +08:00
Jin Hai	89f438fe45	Add ping command to test ping API (#12757 ) ### What problem does this PR solve? As title. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-22 00:18:29 +08:00
Jin Hai	2e2c8f6ca9	Add more commands to RAGFlow CLI (#12731 ) ### What problem does this PR solve? This PR is going to make RAGFlow CLI to access RAGFlow as normal user, and work as the a testing tool for RAGFlow server. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-21 18:49:52 +08:00
balibabu	6cd4fd91e6	Fix: Allow classification operators to be followed by other classification operators. #9082 (#12744 ) ### What problem does this PR solve? Fix: Allow classification operators to be followed by other classification operators. #9082 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-21 16:24:39 +08:00
chanx	83e17d8c4a	Fix: Optimize the metadata code structure to implement metadata list structure functionality. (#12741 ) ### What problem does this PR solve? Fix: Optimize the metadata code structure to implement metadata list structure functionality. #11564 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-21 16:15:43 +08:00
balibabu	e1143d40bc	Feat: Add a think button to the chat box. #12742 (#12743 ) ### What problem does this PR solve? Feat: Add a think button to the chat box. #12742 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-21 15:39:18 +08:00
Liu An	f98abf14a8	Refa(test): improve code formatting and remove debug prints (#12739 ) ### What problem does this PR solve? - Improving code formatting and consistency - Removing debug print statements ### Type of change - [x] Refactoring	2026-01-21 14:53:17 +08:00
Liu An	2a87778e10	Chore(ci): use new Web API test cases in CI (#12738 ) ### What problem does this PR solve? - Update pytest commands to use new test directory structure ### Type of change - [x] chore(ci)	2026-01-21 14:53:05 +08:00
Stephen Hu	5836823187	Refactor:better handle list agent api desc param (#12733 ) ### What problem does this PR solve? better handle list agent api desc param ### Type of change - [x] Refactoring	2026-01-21 13:09:27 +08:00
chanx	5a7026cf55	Feat: Improve metadata logic (#12730 ) ### What problem does this PR solve? Feat: Improve metadata logic ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-21 11:31:26 +08:00
LGRY	bc7935d627	feat: add batch delete for conversations in chat（web） (#12584 ) Resolves #12572 ## What problem does this PR solve? The conversation list in chat sessions previously only supported deleting conversations one by one. This was inefficient when users needed to clean up multiple conversations. This PR adds batch delete functionality to improve user experience. ## Type of change - [x] New Feature (non-breaking change which adds functionality) ## Specific changes - Add selection mode with checkboxes for conversation list - Add batch delete functionality with custom icons - Add internationalization support (en/zh) - Use existing removeConversation API which supports batch deletion ## UI modification status - Default: Show [+] and [batch delete icon] - Selection mode: Show checkboxes, keep [+] and [select all icon] - Items selected: Show [return icon] and [red trash icon]" ### Repair Comparison 1.Before Repair <img width="982" height="1221" alt="image" src="https://github.com/user-attachments/assets/8a80f7c0-7da6-41ec-9d1a-ac887ede96ba" /> 2.After Repair <img width="1273" height="919" alt="新增批量删除效果图" src="https://github.com/user-attachments/assets/e179bdf3-3779-4bd5-84b6-8e24780a22ea" /> --- Co-authored-by: Gongzi --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-20 19:13:53 +08:00
Haipeng LI	7787085664	Doc: add README for test (#12728 ) ### What problem does this PR solve? We added instructions on how to test RAGFlow in test/README.md. ### Type of change - [x] Documentation Update	2026-01-20 19:12:35 +08:00
6ba3i	960ecd3158	Feat: update and add new tests for web api apps (#12714 ) ### What problem does this PR solve? This PR adds missing web API tests (system, search, KB, LLM, plugin, connector). It also addresses a contract mismatch that was causing test failures: metadata updates did not persist new keys (update‑only behavior). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): Test coverage expansion and test helper instrumentation	2026-01-20 19:12:15 +08:00
6ba3i	aee9860970	Make document change-status idempotent for Infinity doc store (#12717 ) ### What problem does this PR solve? This PR makes the document change‑status endpoint idempotent under the Infinity doc store. If a document already has the requested status, the handler returns success without touching the engine, preventing unnecessary updates and avoiding missing‑table errors while keeping responses consistent. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 19:11:21 +08:00
Jimmy Ben Klieve	9ebbc5a74d	chore: redirect to login page if api reports unauthorized in admin page (#12726 ) ### What problem does this PR solve? Auto redirect to login page if API reports `401: Unauthroized` in ANY Admin page. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 18:58:13 +08:00
Jimmy Ben Klieve	1c65f64bda	fix: missing route for user detail page (#12725 ) ### What problem does this PR solve? Add missing route for navigating to `/admin/users/:id` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 18:55:44 +08:00
Kevin Hu	32841549c1	Fix: Not within a request context (#12723 ) ### What problem does this PR solve? ERROR 1819426 Unhandled exception during request Traceback (most recent call last): File "/home/qinling/[github.com/infiniflow/ragflow/api/apps/document_app.py](http://github.com/infiniflow/ragflow/api/apps/document_app.py)", line 639, in run return await thread_pool_exec(_run_sync) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/common/misc_utils.py](http://github.com/infiniflow/ragflow/common/misc_utils.py)", line 132, in thread_pool_exec return await loop.run_in_executor(_thread_pool_executor(), func, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/futures.py", line 287, in __await__ yield self # This tells Task to wait for completion. ^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup future.result() File "/usr/lib/python3.12/asyncio/futures.py", line 203, in result raise self._exception.with_traceback(self._exception_tb) File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/api/apps/document_app.py](http://github.com/infiniflow/ragflow/api/apps/document_app.py)", line 593, in _run_sync if not DocumentService.accessible(doc_id, [current_user.id](http://current_user.id/)): ^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py](http://github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py)", line 318, in __get__ obj = instance._get_current_object() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py](http://github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py)", line 526, in _get_current_object return get_name(local()) ^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/api/apps/__init__.py](http://github.com/infiniflow/ragflow/api/apps/__init__.py)", line 97, in _load_user authorization = request.headers.get("Authorization") ^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py](http://github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py)", line 318, in __get__ obj = instance._get_current_object() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/qinling/[github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py](http://github.com/infiniflow/ragflow/.venv/lib/python3.12/site-packages/werkzeug/local.py)", line 519, in _get_current_object raise RuntimeError(unbound_message) from None RuntimeError: Not within a request context ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 16:56:41 +08:00
writinwaters	046d4ffdef	Docs: Updated configuration file name (#12720 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2026-01-20 15:40:03 +08:00
longbingljw	4c4d434bc1	Unify MySQL configuration (#12644 ) ### What problem does this PR solve? Align MySQL defaults between docker/.env and docker/service_conf.yaml.template close #12645 ### Type of change - [x] Other (please describe):Unify MySQL configuration	2026-01-20 13:42:22 +08:00
balibabu	80612bc992	Refactor: Replace antd with shadcn (#12718 ) ### What problem does this PR solve? Refactor: Replace antd with shadcn ### Type of change - [x] Refactoring	2026-01-20 13:38:54 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
lys1313013	120648ac81	fix: inaccurate error message when uploading multiple files containing an unsupported file type (#12711 ) ### What problem does this PR solve? When uploading multiple files at once, if any of the files are of an unsupported type and the blob is not removed, it triggers a TypeError('Object of type bytes is not JSON serializable') exception. This prevents the frontend from responding properly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 12:24:54 +08:00
E.G	f367189703	fix(raptor): handle missing vector fields gracefully (#12713 ) ## Summary This PR fixes a `KeyError` crash when running RAPTOR tasks on documents that don't have the expected vector field. ## Related Issue Fixes https://github.com/infiniflow/ragflow/issues/12675 ## Problem When running RAPTOR tasks, the code assumes all chunks have the vector field `q_<size>_vec` (e.g., `q_1024_vec`). However, chunks may not have this field if: 1. They were indexed with a different embedding model (different vector size) 2. The embedding step failed silently during initial parsing 3. The document was parsed before the current embedding model was configured This caused a crash: ``` KeyError: 'q_1024_vec' ``` ## Solution Added defensive validation in `run_raptor_for_kb()`: 1. Check for vector field existence before accessing it 2. Skip chunks that don't have the required vector field instead of crashing 3. Log warnings for skipped chunks with actionable guidance 4. Provide informative error messages suggesting users re-parse documents with the current embedding model 5. Handle both scopes (`file` and `kb` modes) ## Changes - `rag/svr/task_executor.py`: Added validation and error handling in `run_raptor_for_kb()` ## Testing 1. Create a knowledge base with an embedding model 2. Parse documents 3. Change the embedding model to one with a different vector size 4. Run RAPTOR task 5. Before: Crashes with `KeyError` 6. After: Gracefully skips incompatible chunks with informative warnings --- <!-- Gittensor Contribution Tag: @GlobalStar117 --> Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>	2026-01-20 12:24:20 +08:00
writinwaters	1b1554c563	Docs: Added ingestion pipeline quickstart (#12708 ) ### What problem does this PR solve? Added ingestion pipeline quickstart ### Type of change - [x] Documentation Update	2026-01-20 09:48:32 +08:00

1 2 3 4 5 ...

5137 Commits