### What problem does this PR solve?
Modifying a document’s parser config previously left behind obsolete
chunk images. If the dataset isn’t manually deleted, these images
accumulate and waste storage. This PR fixes the issue by automatically
removing associated images when the parser config changes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Handle 404 exception when init memory size from es.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Liu An <asiro@qq.com>
### What problem does this PR solve?
Judge retrieval from in retrieval component, and fix bug in message
component
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix several text issues.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Manage message and use in agent.
Issue #4213
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Supports filter documents by empty metadata
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add document metadata setting.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Deduplicate metadata lists during updates.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: /kb/update does not update FileService
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
change:
enhance webhook response to include status and success fields and
simplify ReAct agent
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Refactor implementation of add_llm
2. Add speech to text model.
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Document list and filter supports metadata filtering.
**OR within the same field, AND across different fields**
Example 1 (multi-field AND):
```markdown
Doc1 metadata: { "a": "b", "as": ["a", "b", "c"] }
Doc2 metadata: { "a": "x", "as": ["d"] }
Query:
metadata = {
"a": ["b"],
"as": ["d"]
}
Result:
Doc1 matches a=b but not as=d → excluded
Doc2 matches as=d but not a=b → excluded
Final result: empty
```
Example 2 (same field OR):
```markdown
Doc1 metadata: { "as": ["a", "b", "c"] }
Doc2 metadata: { "as": ["d"] }
Query:
metadata = {
"as": ["a", "d"]
}
Result:
Doc1 matches as=a → included
Doc2 matches as=d → included
Final result: Doc1 + Doc2
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
After a file in the file list is associated with a knowledge base, the
knowledge base document ID is returned
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Chats completions API supports metadata filtering.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
issue:
https://github.com/infiniflow/ragflow/issues/10427https://github.com/infiniflow/ragflow/issues/8115
change:
- Support for Multiple HTTP Methods (POST / GET / PUT / PATCH / DELETE /
HEAD)
- Security Validation
1. max_body_size
2. IP whitelist
3. rate limit
4. token / basic / jwt authentication
- File Upload Support
- Unified Content-Type Handling
- Full Schema-Based Extraction & Type Validation
- Two Execution Modes: Immediately / Streaming
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Trace information can be returned by the agent completion API.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Reject default admin account log in to normal services
#11854#11673
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
…tant, but model is available via UI
Fix: add multimodel models in chat api
Fixes#8549
### What problem does this PR solve?
Add a parameter model_type in chat api.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Feat: Enable image edit in edit_chunk
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### PR details
feat: Add Excel export support and fix variable reference regex
Changes:
- Add Excel export output format option to Message component
- Apply nest_asyncio patch to handle nested event loops
- Fix async generator iteration in canvas_app.py debug endpoint
- Add underscore support in variable reference regex pattern
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Shivam Johri <shivamjohri@Shivams-MacBook-Air.local>
我已在下面的评论中用中文重复说明。
### What problem does this PR solve?
## Summary
This PR enhances the MinerU document parser with additional
configuration options, giving users more control over PDF parsing
behavior and improving support for multilingual documents.
## Changes
### Backend (`deepdoc/parser/mineru_parser.py`)
- Added configurable parsing options:
- **Parse Method**: `auto`, `txt`, or `ocr` — allows users to choose the
extraction strategy
- **Formula Recognition**: Toggle for enabling/disabling formula
extraction (useful to disable for Cyrillic documents where it may cause
issues)
- **Table Recognition**: Toggle for enabling/disabling table extraction
- Added language code mapping (`LANGUAGE_TO_MINERU_MAP`) to translate
RAGFlow language settings to MinerU-compatible language codes for better
OCR accuracy
- Improved parser configuration handling to pass these options through
the processing pipeline
### Frontend (`web/`)
- Created new `MinerUOptionsFormField` component that conditionally
renders when MinerU is selected as the layout recognition engine
- Added UI controls for:
- Parse method selection (dropdown)
- Formula recognition toggle (switch)
- Table recognition toggle (switch)
- Added i18n translations for English and Chinese
- Integrated the options into both the dataset creation dialog and
dataset settings page
### Integration
- Updated `rag/app/naive.py` to forward MinerU options to the parser
- Updated task service to handle the new configuration parameters
## Why
MinerU is a powerful document parser, but the default settings don't
work well for all document types. This PR allows users to:
1. Choose the best parsing method for their documents
2. Disable formula recognition for Cyrillic/non-Latin scripts where it
causes issues
3. Control table extraction based on document needs
4. Benefit from automatic language detection for better OCR results
## Testing
- [x] Tested MinerU parsing with different parse methods
- [x] Verified UI renders correctly when MinerU is selected/deselected
- [x] Confirmed settings persist correctly in dataset configuration
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: user210 <user210@rt>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Refactor metadata filter.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Correct metadata update behavior. #11912
When update `value` is omitted, the corresponding keys are updated to
`"value"` regardless of their current values.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)