### What problem does this PR solve?
Migrate CV model chat to Async. #11750
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Cleanup synchronous functions in chat_model and implement
synchronization for conversation and dialog chats.
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
This PR addresses **two independent issues** encountered when using the
MinerU engine in Ragflow:
1. **MinerU API output path mismatch for non-ASCII filenames**
MinerU sanitizes the root directory name inside the returned ZIP when
the original filename contains non-ASCII characters (e.g., Chinese).
Ragflow's client-side unzip logic assumed the original filename stem and
therefore failed to locate `_content_list.json`.
This PR adds:
* root-directory detection
* fallback lookup using sanitized names
* a broadened `_read_output` search with a glob fallback
ensuring output files are consistently located regardless of filename
encoding.
2. **Chunker crash due to tuple-structure mismatch in manual mode**
Some parsers (e.g., MinerU / Docling) return **2-tuple sections**, but
Ragflow’s chunker expects **3-tuple sections**, leading to:
`ValueError: not enough values to unpack (expected 3, got 2)`
This PR normalizes all sections to a uniform structure `(text, layout,
positions)`:
* parse position tags when present
* default to empty positions when missing
preserving backward compatibility and preventing crashes.
### Type of change
* [x] Bug Fix (non-breaking change which fixes an issue)
[#11136](https://github.com/infiniflow/ragflow/issues/11136)
[#11700](https://github.com/infiniflow/ragflow/issues/11700)
[#11620](https://github.com/infiniflow/ragflow/issues/11620)
[#11701](https://github.com/infiniflow/ragflow/pull/11701)
we need your help [yongtenglei](https://github.com/yongtenglei)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: update front end for confluence connector
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: add more attribute for confluence connector.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
huqie.txt and huqie.txt.trie are put to infinity-sdk in
https://github.com/infiniflow/infinity/pull/3127.
Remove huqie.txt from ragflow and bump infinity to 0.6.10 in this PR.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
This Pull Request introduces native support for Google Cloud Storage
(GCS) as an optional object storage backend.
Currently, RAGFlow relies on a limited set of storage options. This
feature addresses the need for seamless integration with GCP
environments, allowing users to leverage a fully managed, highly
durable, and scalable storage service (GCS) instead of needing to deploy
and maintain third-party object storage solutions. This simplifies
deployment, especially for users running on GCP infrastructure like GKE
or Cloud Run.
The implementation uses a single GCS bucket defined via configuration,
mapping RAGFlow's internal logical storage units (or "buckets") to
folder prefixes within that GCS container to maintain data separation.
This architectural choice avoids the operational complexities associated
with dynamically creating and managing unique GCS buckets for every
logical unit.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
This PR fixes two critical bugs in `chunk_list()` method that prevent
processing large documents (>128 chunks) in GraphRAG and
other workflows.
## Bugs Fixed
### Bug 1: Incorrect pagination offset calculation
**Location:** `rag/nlp/search.py` lines 530-531
**Problem:** The loop variable `p` was used directly as offset, causing
incorrect pagination:
```python
# BEFORE (BUGGY):
for p in range(offset, max_count, bs): # p = 0, 128, 256, 384...
es_res = self.dataStore.search(..., p, bs, ...) # p used as offset
Fix: Use page number multiplied by batch size:
# AFTER (FIXED):
for page_num, p in enumerate(range(offset, max_count, bs)):
es_res = self.dataStore.search(..., page_num * bs, bs, ...)
Bug 2: Premature loop termination
Location: rag/nlp/search.py lines 538-539
Problem: Loop terminates when any page returns fewer than 128 chunks,
even when thousands more remain:
# BEFORE (BUGGY):
if len(dict_chunks.values()) < bs: # Breaks at 126 chunks even if 3,000+
remain
break
Fix: Only terminate when zero chunks returned:
# AFTER (FIXED):
if len(dict_chunks.values()) == 0:
break
Enhancement: Add max_count parameter to GraphRAG
Location: graphrag/general/index.py line 60
Added max_count=10000 parameter to chunk loading for both LightRAG and
General GraphRAG paths to ensure all chunks are
processed.
Testing
Validated with a 314-page legal document containing 3,207 chunks:
Before fixes:
- Only 2-126 chunks processed
- GraphRAG generated 25 nodes, 8 edges
After fixes:
- All 3,209 chunks processed ✅
- GraphRAG processing complete dataset
Impact
These bugs affect any workflow using chunk_list() with large documents,
particularly:
- GraphRAG knowledge graph generation
- RAPTOR hierarchical summarization
- Document processing pipelines with >128 chunks
Related Issue
Fixes#11687
Checklist
- Code follows project style guidelines
- Tested with large documents (3,207+ chunks)
- Both bugs validated by Dosu bot in issue #11687
- No breaking changes to API
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feature: This PR implements automatic Raptor disabling for structured
data files to address issue #11653.
**Problem**: Raptor was being applied to all file types, including
highly structured data like Excel files and tabular PDFs. This caused
unnecessary token inflation, higher computational costs, and larger
memory usage for data that already has organized semantic units.
**Solution**: Automatically skip Raptor processing for:
- Excel files (.xls, .xlsx, .xlsm, .xlsb)
- CSV files (.csv, .tsv)
- PDFs with tabular data (table parser or html4excel enabled)
**Benefits**:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream applications
**Usage Examples**:
```
# Excel file - automatically skipped
should_skip_raptor(".xlsx") # True
# CSV file - automatically skipped
should_skip_raptor(".csv") # True
# Tabular PDF - automatically skipped
should_skip_raptor(".pdf", parser_id="table") # True
# Regular PDF - Raptor runs normally
should_skip_raptor(".pdf", parser_id="naive") # False
# Override for special cases
should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False}) # False
```
**Configuration**: Includes `auto_disable_for_structured_data` toggle
(default: true) to allow override for special use cases.
**Testing**: 44 comprehensive tests, 100% passing
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Rename function and refactor log message
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Incorrect async chat streamly output. #11677.
Disable beartype for #11666.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Feat: add mineru auto installer
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- Original rag/nlp/rag_tokenizer.py is put to Infinity and infinity-sdk
via https://github.com/infiniflow/infinity/pull/3117 .
Import rag_tokenizer from infinity and inherit from
rag_tokenizer.RagTokenizer in new rag/nlp/rag_tokenizer.py.
- Bump infinity to 0.6.8
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Add MiniMax-M2 and remove deprecated models.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
change:
new api /sequence2txt,
update QWenSeq2txt and ZhipuSeq2txt
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: jina embedding issue #11614
Feat: Add jina embedding v4
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.
However, the impact of these changes still requires further
investigation to ensure everything works as expected.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Add fallbacks for MinerU output path. #11613, #11620.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Support for Redis 6+ ACL authentication (username)
close#11606
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
optimize meta filter generation for better structure handling
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: doc_aggs not correctly returned when no chunks retrieved.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add GPT-5.1, GPT‑5.1 Instant and Claude-Opus-4.5. #11548
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- Update sync data source to handle metadata properly
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
This PR adds webdav storage as data source for data sync service.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)