### What problem does this PR solve?
Fix: create dataset return type inconsistent #11167
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes an issue where default models which used the same factory but
different base URLs would all be initialised with the default chat
model's base URL and would ignore e.g. the embedding model's base URL
config.
For example, with the following service config, the embedding and
reranker models would end up using the base URL for the default chat
model (i.e. `llm1.example.com`):
```yaml
ragflow:
service_conf:
user_default_llm:
factory: OpenAI-API-Compatible
api_key: not-used
default_models:
chat_model:
name: llm1
base_url: https://llm1.example.com/v1
embedding_model:
name: llm2
base_url: https://llm2.example.com/v1
rerank_model:
name: llm3
base_url: https://llm3.example.com/v1/rerank
llm_factories:
factory_llm_infos:
- name: OpenAI-API-Compatible
logo: ""
tags: "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION"
status: "1"
llm:
- llm_name: llm1
base_url: 'https://llm1.example.com/v1'
api_key: not-used
tags: "LLM,CHAT,IMAGE2TEXT"
max_tokens: 100000
model_type: chat
is_tools: false
- llm_name: llm2
base_url: https://llm2.example.com/v1
api_key: not-used
tags: "TEXT EMBEDDING"
max_tokens: 10000
model_type: embedding
- llm_name: llm3
base_url: https://llm3.example.com/v1/rerank
api_key: not-used
tags: "RERANK,1k"
max_tokens: 10000
model_type: rerank
```
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add the specified parent_path to the document upload api interface
(#11230)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: virgilwong <hyhvirgil@gmail.com>
### What problem does this PR solve?
GraphRAG and RAPTOR tasks do not affect document status.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
#10056
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Optimize Prompts and Regex for use_sql() #11127
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
```
admin> show version;
show_version
+-----------------------+
| version |
+-----------------------+
| v0.21.0-241-gc6cf58d5 |
+-----------------------+
admin> \q
Goodbye!
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
1. Move EMBEDDING_CFG to common.globals
2. Fix error imports
3. Move signal handles to common/signal_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
1. Rename identifier name
2. Fix some return statement
3. Fix some typos
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Create dataset performance unmatched between HTTP api and web ui
#10925
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: parsing hyperlinks in docx and pdf #10848
Fix: default parser config of toc extraction
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: the input length exceeds the context length #10750
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Add time utilities and unit tests
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Fix: can't upload image in ollama model #10447
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### Change all `image=[]` to `image = None`
Changing `image=[]` to `images=None` avoids Python’s mutable default
parameter issue.
If you keep `images=[]`, all calls share the same list, so modifying it
(e.g., images.append()) will affect later calls.
Using images=None and creating a new list inside the function ensures
each call is independent.
This change does not affect current behavior — it simply makes the code
safer and more predictable.
把 `images=[]` 改成 `images=None` 是为了避免 Python 默认参数的可变对象问题。
如果保留 `images=[]`,所有调用都会共用同一个列表,一旦修改就会影响后续调用。
改成 None 并在函数内部重新创建列表,可以确保每次调用都是独立的。
这个修改不会影响现有运行结果,只是让代码更安全、更可控。
### What problem does this PR solve?
Feat: Support attribute filtering #8703
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: writinwaters <cai.keith@gmail.com>
### What problem does this PR solve?
File: Now parsing support all types of embedded documents, solved #10059
Fix: Incomplete words in chat #10530
### Type of change
- [x] New Feature (non-breaking change which adds functionality)