mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-01-23 03:26:53 +08:00
## Summary This PR fixes a `KeyError` crash when running RAPTOR tasks on documents that don't have the expected vector field. ## Related Issue Fixes https://github.com/infiniflow/ragflow/issues/12675 ## Problem When running RAPTOR tasks, the code assumes all chunks have the vector field `q_<size>_vec` (e.g., `q_1024_vec`). However, chunks may not have this field if: 1. They were indexed with a **different embedding model** (different vector size) 2. The embedding step **failed silently** during initial parsing 3. The document was parsed before the current embedding model was configured This caused a crash: ``` KeyError: 'q_1024_vec' ``` ## Solution Added defensive validation in `run_raptor_for_kb()`: 1. **Check for vector field existence** before accessing it 2. **Skip chunks** that don't have the required vector field instead of crashing 3. **Log warnings** for skipped chunks with actionable guidance 4. **Provide informative error messages** suggesting users re-parse documents with the current embedding model 5. **Handle both scopes** (`file` and `kb` modes) ## Changes - `rag/svr/task_executor.py`: Added validation and error handling in `run_raptor_for_kb()` ## Testing 1. Create a knowledge base with an embedding model 2. Parse documents 3. Change the embedding model to one with a different vector size 4. Run RAPTOR task 5. **Before**: Crashes with `KeyError` 6. **After**: Gracefully skips incompatible chunks with informative warnings --- <!-- Gittensor Contribution Tag: @GlobalStar117 --> Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>