Commit Graph

4752 Commits

Author SHA1 Message Date
151480dc85 Feat: trace information can be returned by the agent completion API (#12019)
### What problem does this PR solve?

Trace information can be returned by the agent completion API.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-18 15:52:11 +08:00
2331b3a270 Refact: Update loggings (#12014)
### What problem does this PR solve?

Refact: Update loggings

### Type of change

- [x] Refactoring
2025-12-18 14:18:03 +08:00
5cd1a678c8 Fix: image edit in edit_chunk (#12009)
### What problem does this PR solve?

Fix: image edit in edit_chunk #11971

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-18 11:35:01 +08:00
cc9546b761 Fix IDE warnings (#12010)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-18 11:27:02 +08:00
a63dcfed6f Refactor: improve cohere calculate total counts (#12007)
### What problem does this PR solve?

improve cohere calculate total counts

### Type of change


- [x] Refactoring
2025-12-18 10:04:28 +08:00
4dd8cdc38b task executor issues (#12006)
### What problem does this PR solve?

**Fixes #8706** - `InfinityException: TOO_MANY_CONNECTIONS` when running
multiple task executor workers

### Problem Description

When running RAGFlow with 8-16 task executor workers, most workers fail
to start properly. Checking logs revealed that workers were
stuck/hanging during Infinity connection initialization - only 1-2
workers would successfully register in Redis while the rest remained
blocked.

### Root Cause

The Infinity SDK `ConnectionPool` pre-allocates all connections in
`__init__`. With the default `max_size=32` and multiple workers (e.g.,
16), this creates 16×32=512 connections immediately on startup,
exceeding Infinity's default 128 connection limit. Workers hang while
waiting for connections that can never be established.

### Changes

1. **Prevent Infinity connection storm** (`rag/utils/infinity_conn.py`,
`rag/svr/task_executor.py`)
- Reduced ConnectionPool `max_size` from 32 to 4 (sufficient since
operations are synchronous)
- Added staggered startup delay (2s per worker) to spread connection
initialization

2. **Handle None children_delimiter** (`rag/app/naive.py`)
   - Use `or ""` to handle explicitly set None values from parser config

3. **MinerU parser robustness** (`deepdoc/parser/mineru_parser.py`)
   - Use `.get()` for optional output fields that may be missing
- Fix DISCARDED block handling: change `pass` to `continue` to skip
discarded blocks entirely

### Why `max_size=4` is sufficient

| Workers | Pool Size | Total Connections | Infinity Limit |
|---------|-----------|-------------------|----------------|
| 16      | 32        | 512               | 128          |
| 16      | 4         | 64                | 128          |
| 32      | 4         | 128               | 128          |

- All RAGFlow operations are synchronous: `get_conn()` → operation →
`release_conn()`
- No parallel `docStoreConn` operations in the codebase
- Maximum 1-2 concurrent connections needed per worker; 4 provides
safety margin

### MinerU DISCARDED block bug

When MinerU returns blocks with `type: "discarded"` (headers, footers,
watermarks, page numbers, artifacts), the previous code used `pass`
which left the `section` variable undefined, causing:

- **UnboundLocalError** if DISCARDED is the first block
- **Duplicate content** if DISCARDED follows another block (stale value
from previous iteration)

**Root cause confirmed via MinerU source code:**

From
[`mineru/utils/enum_class.py`](https://github.com/opendatalab/MinerU/blob/main/mineru/utils/enum_class.py#L14):
```python
class BlockType:
    DISCARDED = 'discarded'
    # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE
```

Per [MinerU
documentation](https://opendatalab.github.io/MinerU/reference/output_files/),
discarded blocks contain content that should be filtered out for clean
text extraction.

**Fix:** Changed `pass` to `continue` to skip discarded blocks entirely.

### Testing

- Verified all 16 workers now register successfully in Redis
- All workers heartbeating correctly
- Document parsing works as expected
- MinerU parsing with DISCARDED blocks no longer crashes

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: user210 <user210@rt>
2025-12-18 10:03:30 +08:00
1a4822d6be Refactor: Improve the timestamp consistency (#11942)
### What problem does this PR solve?

Improve the timestamp consistency 

### Type of change
- [x] Refactoring
2025-12-18 09:40:33 +08:00
ce161f09cc feat: add image uploader in edit chunk dialog (#12003)
### What problem does this PR solve?

Add image uploader in edit chunk dialog for replacing image chunk

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-18 09:33:52 +08:00
672958a192 Fix: model not authorized (#12001)
### What problem does this PR solve?

Fix model not authorized. #11973.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-17 19:48:24 +08:00
3820de916c Fix: duplicated PDF parser (#12000)
### What problem does this PR solve?

Fix duplicated PDF parser.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-17 19:48:10 +08:00
ef44979b5c Fix table format warning in Markdown file (#12002)
### What problem does this PR solve?

As title

### Type of change

- [x] Documentation Update
- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-17 19:27:47 +08:00
d38f8a1562 Add license and Fix IDE warnings (#11985)
### What problem does this PR solve?

- Add license
- Fix IDE warnings

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-17 17:04:44 +08:00
8e4d011b15 Fix: parent-children chunking method. (#11997)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-12-17 16:50:36 +08:00
7baa67dfe8 Feat: Reject default admin account log in to normal services (#11994)
### What problem does this PR solve?

Feat: Reject default admin account log in to normal services
#11854
#11673

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-17 16:29:20 +08:00
e58271ef76 feat: add toc option in transformer node in ingestion pipeline (#11992)
### What problem does this PR solve?

Add TOC (Table of contents) option in Ingestion Pipeline canvas >
Transformer node

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-17 15:51:55 +08:00
4fd4a41e7c Fix: add multimodel models in chat api (#11986)
…tant, but model is available via UI

Fix: add multimodel models in chat api
Fixes #8549

### What problem does this PR solve?

Add a parameter model_type in chat api.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-12-17 15:46:43 +08:00
82d4e5fb87 Ref: update loggings (#11987)
### What problem does this PR solve?

Ref: update loggins

### Type of change

- [x] Refactoring
2025-12-17 15:43:25 +08:00
d16643a53d Fix: Fixed the issue of empty memory parameters (#11988)
### What problem does this PR solve?

Fix: Fixed the issue of empty memory parameters

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-17 15:42:29 +08:00
93ca1e0b91 Fix: update document api sample reponse is out of date. (#11989)
### What problem does this PR solve?

Fix: update document api sample reponse is out of date.

### Type of change

- [x] Documentation Update
2025-12-17 15:39:12 +08:00
4046bffaf1 fix: unable to save ingestion pipeline config without modifying children delimiter (#11991)
…ildren delimiter

### What problem does this PR solve?

Fix the issue of unable to save **Files > Ingestion Pipeline (Modal)**
config without modifying children delimiter

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-17 15:37:28 +08:00
03f9be7cbb Refa: only support MinerU-API now (#11977)
### What problem does this PR solve?

Only support MinerU-API now, still need to complete frontend for
pipeline to allow the configuration of MinerU options.

### Type of change

- [x] Refactoring
2025-12-17 12:58:48 +08:00
5e05f43c3d Update default prompt (#11984)
### What problem does this PR solve?

New default prompt:

```
You are an intelligent assistant. Your primary function is to answer questions based strictly on the provided knowledge base.

**Essential Rules:**
- Your answer must be derived **solely** from this knowledge base: `{knowledge}`.
- **When information is available**: Summarize the content to give a detailed answer.
- **When information is unavailable**: Your response must contain this exact sentence: "The answer you are looking for is not found in the knowledge base!"
- **Always consider** the entire conversation history.
```

Also fix some grammar errors.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-17 12:57:24 +08:00
205a6483f5 Feature:memory function complete (#11982)
### What problem does this PR solve?

memory function complete

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-17 12:35:26 +08:00
2595644dfd feat: add ingestion pipeline children delimiters configs (#11979)
### What problem does this PR solve?

Add children delimiters for Ingestion pipeline config

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-17 11:18:54 +08:00
30019dab9f Change knowledge base to dataset (#11976)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-17 10:03:33 +08:00
4d46726eb7 Docs: Updated executor manager prerequisites (#11978)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-12-16 19:59:56 +08:00
0e8b9588ba Fix error and format issue (#11975)
### What problem does this PR solve?

1. Fix error of book chunking.
2. Fix format issues.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-16 19:29:37 +08:00
344a106eba Feat: Enable image edit in edit_chunk (#11971)
### What problem does this PR solve?

Feat: Enable image edit in edit_chunk

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-16 17:57:00 +08:00
bccad7b4a8 Docs: Migrate to single bucket mode (contributed by community) (#11972)
### What problem does this PR solve?

Update

### Type of change

- [x] Documentation Update
2025-12-16 16:31:47 +08:00
f7926724aa Fix security issue (#11965)
### What problem does this PR solve?

- CVE-2024-6866
- CVE-2024-6844
- CVE-2024-6839

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-16 13:31:45 +08:00
5bba562048 Feature/excel export fix (#11914)
### PR details 
feat: Add Excel export support and fix variable reference regex
Changes:
- Add Excel export output format option to Message component
- Apply nest_asyncio patch to handle nested event loops
- Fix async generator iteration in canvas_app.py debug endpoint
- Add underscore support in variable reference regex pattern


### What problem does this PR solve?



### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Shivam Johri <shivamjohri@Shivams-MacBook-Air.local>
2025-12-16 13:15:52 +08:00
49c74d08e8 Feature/mineru improvements (#11938)
我已在下面的评论中用中文重复说明。

### What problem does this PR solve?

## Summary
This PR enhances the MinerU document parser with additional
configuration options, giving users more control over PDF parsing
behavior and improving support for multilingual documents.

## Changes

### Backend (`deepdoc/parser/mineru_parser.py`)
- Added configurable parsing options:
- **Parse Method**: `auto`, `txt`, or `ocr` — allows users to choose the
extraction strategy
- **Formula Recognition**: Toggle for enabling/disabling formula
extraction (useful to disable for Cyrillic documents where it may cause
issues)
- **Table Recognition**: Toggle for enabling/disabling table extraction
- Added language code mapping (`LANGUAGE_TO_MINERU_MAP`) to translate
RAGFlow language settings to MinerU-compatible language codes for better
OCR accuracy
- Improved parser configuration handling to pass these options through
the processing pipeline

### Frontend (`web/`)
- Created new `MinerUOptionsFormField` component that conditionally
renders when MinerU is selected as the layout recognition engine
- Added UI controls for:
  - Parse method selection (dropdown)
  - Formula recognition toggle (switch)
  - Table recognition toggle (switch)
- Added i18n translations for English and Chinese
- Integrated the options into both the dataset creation dialog and
dataset settings page

### Integration
- Updated `rag/app/naive.py` to forward MinerU options to the parser
- Updated task service to handle the new configuration parameters

## Why
MinerU is a powerful document parser, but the default settings don't
work well for all document types. This PR allows users to:
1. Choose the best parsing method for their documents
2. Disable formula recognition for Cyrillic/non-Latin scripts where it
causes issues
3. Control table extraction based on document needs
4. Benefit from automatic language detection for better OCR results

## Testing
- [x] Tested MinerU parsing with different parse methods
- [x] Verified UI renders correctly when MinerU is selected/deselected
- [x] Confirmed settings persist correctly in dataset configuration

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: user210 <user210@rt>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-16 13:15:25 +08:00
1112b6291b Remove unused py module dependencies (#11964)
### What problem does this PR solve?

```
pip list --not-required
Package                      Version
---------------------------- ---------------------
aiosmtplib                   5.0.0
akshare                      1.17.94
anthropic                    0.34.1
arxiv                        2.1.3
Aspose.Slides                24.7.0
atlassian-python-api         4.0.7
azure-identity               1.17.1
azure-storage-file-datalake  12.16.0
bio                          1.7.1
boxsdk                       10.2.0
captcha                      0.7.1
cn2an                        0.5.22
cohere                       5.6.2
Crawl4AI                     0.4.247
dashscope                    1.20.11
deepl                        1.18.0
demjson3                     3.0.6
discord.py                   2.3.2
dropbox                      12.0.2
duckduckgo_search            7.5.5
editdistance                 0.8.1
elasticsearch-dsl            8.12.0
exceptiongroup               1.3.1
extract-msg                  0.55.0
ffmpeg-python                0.2.0
flasgger                     0.9.7.1
Flask-Cors                   5.0.0
Flask-Login                  0.6.3
Flask-Mail                   0.10.0
Flask-Session                0.8.0
google-auth-oauthlib         1.2.3
google-genai                 1.55.0
google-generativeai          0.8.5
google_search_results        2.4.2
graspologic                  0.1.dev847+g38e680cab
groq                         0.9.0
grpcio-status                1.67.1
html_text                    0.6.2
imageio-ffmpeg               0.6.0
infinity_emb                 0.0.66
infinity-sdk                 0.6.11
jira                         3.10.5
json_repair                  0.35.0
langfuse                     3.10.5
mammoth                      1.11.0
Markdown                     3.6
markdown_to_json             2.1.1
markdownify                  1.2.2
mcp                          1.19.0
mini-racer                   0.12.4
minio                        7.2.4
mistralai                    0.4.2
moodlepy                     0.24.1
mypy-boto3-s3                1.40.26
Office365-REST-Python-Client 2.6.2
ollama                       0.6.1
onnxruntime-gpu              1.23.2
opencv-python                4.10.0.84
opencv-python-headless       4.10.0.84
opendal                      0.45.20
opensearch-py                2.7.1
ormsgpack                    1.5.0
pdfplumber                   0.10.4
pip                          25.3
pluginlib                    0.9.4
psycopg2-binary              2.9.11
pyclipper                    1.4.0
pycryptodomex                3.20.0
pyobvector                   0.2.18
pyodbc                       5.3.0
pypandoc                     1.16.2
pypdf                        6.4.0
PyPDF2                       3.0.1
python-calamine              0.6.1
python-docx                  1.2.0
python-pptx                  1.0.2
pywencai                     0.13.1
qianfan                      0.4.6
quart-auth                   0.11.0
quart-cors                   0.8.0
ranx                         0.3.20
readability-lxml             0.8.4.1
replicate                    0.31.0
reportlab                    4.4.6
roman-numbers                1.0.2
ruamel.base                  1.0.0
ruamel.yaml                  0.18.16
scholarly                    1.7.11
selenium-wire                5.1.0
slack_sdk                    3.37.0
socksio                      1.0.0
sqlglotrs                    0.9.0
StrEnum                      0.4.15
tavily-python                0.5.1
tencentcloud-sdk-python      3.0.1478
tika                         2.6.0
valkey                       6.0.2
vertexai                     1.70.0
volcengine                   1.0.194
voyageai                     0.2.3
webdav4                      0.10.0
webdriver-manager            4.0.1
wikipedia                    1.4.0
word2number                  1.1
xgboost                      1.6.0
xpinyin                      0.7.6
yfinance                     0.2.65
zhipuai                      2.0.1

```

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-16 12:40:03 +08:00
ef5d1d4b74 Fix: 'AzureEmbed' object has no attribute 'total_token_count_from_response' (#11962)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/11956

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-16 11:29:07 +08:00
a98887d4ca Fix: Bug fixes (#11960)
### What problem does this PR solve?

Fix: Bug fixes

New search popup style modification
Fixed multilingual settings not updating immediately on personal center
page
Changed overlapped percent to percentage format, with maximum value of
30%
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-16 09:44:06 +08:00
7ca3e11566 Update dataset config and retrieval testing (#11958)
### What problem does this PR solve?

1. Refactor the order of the dataset config items.
2. Refactor the text of retrieval test.
3. Refactor typos

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-15 19:56:28 +08:00
a2e080c2d3 feat: display name instead of key in user fillup form submission (#11931)
### What problem does this PR solve?

- Change the message format from 'key: value' to 'name: value' when user
submits the fillup form in agent chat.
- This resolves #11865
- I think this change makes sense, better aligning the form and the
replied message.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-15 19:12:01 +08:00
ad6f7fd4b0 Fix: pipeline ignore MinerU backend config and vllm module is missing (#11955)
### What problem does this PR solve?

Fix pipeline ignore MinerU backend config and vllm module is missing.
#11944, #11947.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-15 18:03:34 +08:00
2a0f835ffe Refactor: Improve the logic to calculate embedding total token count (#11943)
### What problem does this PR solve?

 Improve the logic to calculate embedding total token count 

### Type of change

- [x] Refactoring
2025-12-15 11:33:57 +08:00
13d8241eee Doc: executor manager updated docker version (#11946)
### What problem does this PR solve?

Add documentation for #11806.

### Type of change

- [x] Documentation Update
2025-12-15 11:13:51 +08:00
1ddd11f045 Feat: Set the return value of the webhook to a string. #10427 (#11945)
### What problem does this PR solve?

Feat: Set the return value of the webhook to a string. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-12-15 11:09:08 +08:00
81eb03d230 Support uploading encrypted files to object storage (#11837) (#11838)
### What problem does this PR solve?

Support uploading encrypted files to object storage.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: virgilwong <hyhvirgil@gmail.com>
2025-12-15 09:45:18 +08:00
7d23c3aed0 Fix: presentation parsing & Embedding encode exception handling (#11933)
### What problem does this PR solve?

Fix: presentation parsing #11920
Fix: Embeddin encode exception handling
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-13 11:37:42 +08:00
6be0338aa0 Fix: Asure-OpenAI resource not found (#11934)
### What problem does this PR solve?

Asure-OpenAI resource not found. #11750


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-13 11:32:46 +08:00
44dec89f1f Fix: aspose-slide issue. (#11935)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 20:16:18 +08:00
2b260901df Fix: raptor don't have attribute chat (#11936)
### What problem does this PR solve?

Raptor don't have attribute chat.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 20:08:18 +08:00
948bc93786 Feat: Add GPT-5.2 & pro (#11929)
### What problem does this PR solve?

Feat: Add GPT-5.2 & pro

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-12 17:35:08 +08:00
0f0fb53256 Refa: refactor metadata filter (#11907)
### What problem does this PR solve?

Refactor metadata filter.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-12 17:12:38 +08:00
0fcb1680fd Feat: Displaying the file option in the webhook's request body #10427 (#11928)
### What problem does this PR solve?

Feat: Displaying the file option in the webhook's request body #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-12-12 16:16:34 +08:00
50715ba332 Fix: forget-reset password (#11927)
### What problem does this PR solve?

Fix: forget-reset password

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 16:16:17 +08:00