### What problem does this PR solve?
issue:
[#10890](https://github.com/infiniflow/ragflow/issues/10890)
change:
missing embedding vector on Tokenizer
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
change:
wrong describe_with_prompt() in ollama
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: UnicodeDecodeError: 'gb2312' codec can't decode byte 0xab in
position 560: illegal multibyte sequence.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: parsing hyperlinks in docx and pdf #10848
Fix: default parser config of toc extraction
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Video parser should follow selected VLM, rather than default one.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
issue:
[#10890](https://github.com/infiniflow/ragflow/issues/10890)
change:
enhance delimiters in markdown parser
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
support local mineru api in docker instance. like no gpu in wsl on
windows, but has mineru api with gpu support.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: opensearch retrieval error #10828
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Allow initialize Redis without password.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: parsing excel with chartsheet #10815
Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
MinerU supports VLM-Transfomers backend.
Set `MINERU_BACKEND="pipeline"` to choose the backend. (Options:
pipeline | vlm-transformers, default is pipeline)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
This PR adds a new TCADP (Tencent Cloud Advanced Document Processing)
parser to RAGFlow, enabling users to leverage Tencent Cloud's document
parsing capabilities for more accurate and structured document
processing. The implementation includes:
New TCADP Parser: A complete implementation of Tencent Cloud's document
parsing API without SDK dependency
Configuration Support: Added configuration options in service_conf.yaml
for Tencent Cloud API credentials
Frontend Integration: Updated UI components to support the new TCADP
parser option
Error Handling: Comprehensive error handling and retry mechanisms for
API calls
Result Processing: Support for both SSE streaming and JSON response
formats from Tencent Cloud API
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix: prio synonym match than wordnet for english
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
issue:
#3945
change:
add Docling parser
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Pipeline supports MinerU PDF parser.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: can't upload image in ollama model #10447
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### Change all `image=[]` to `image = None`
Changing `image=[]` to `images=None` avoids Python’s mutable default
parameter issue.
If you keep `images=[]`, all calls share the same list, so modifying it
(e.g., images.append()) will affect later calls.
Using images=None and creating a new list inside the function ensures
each call is independent.
This change does not affect current behavior — it simply makes the code
safer and more predictable.
把 `images=[]` 改成 `images=None` 是为了避免 Python 默认参数的可变对象问题。
如果保留 `images=[]`,所有调用都会共用同一个列表,一旦修改就会影响后续调用。
改成 None 并在函数内部重新创建列表,可以确保每次调用都是独立的。
这个修改不会影响现有运行结果,只是让代码更安全、更可控。
### What problem does this PR solve?
Fix potential negative max_tokens in RAPTOR. #10235.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue
### What problem does this PR solve?
issue:
[#7472](https://github.com/infiniflow/ragflow/issues/7472)
change:
Vision Model Image Enhancement in Manual chunker
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Qwen-VL series supports video parsing. #10617.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)