ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-02-01 08:05:07 +08:00

Author	SHA1	Message	Date
Kevin Hu	24625e0695	Fix: presentation of PDF using vlm. (#8133 ) ### What problem does this PR solve? #8109 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-09 15:01:52 +08:00
Zhichang Yu	1ed0b25910	Fix task_limiter in raptor.py (#8124 ) ### What problem does this PR solve? Fix task_limiter in raptor.py ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-09 10:18:03 +08:00
Kevin Hu	7ed9efcd4e	Fix: QWenCV issue. (#8106 ) ### What problem does this PR solve? Close #8097 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-06 17:55:13 +08:00
Wanderson Pinto dos Santos	0e03542db5	fix: single task executor getting all tasks from Redis queue (#7330 ) ### What problem does this PR solve? Currently, as long as there are tasks in Redis, this loop will keep getting the tasks. This will lead to a single task executor with many tasks in the pending state. Then we need to wait for the pending tasks to get them back in the queue. In first place, if we set the `MAX_CONCURRENT_TASKS` to X, then only X tasks should be picked from the queue, and others should be left in the queue for other `task_executors` or be picked after 1 of the spots in the current executor gets free. This PR ensures this behavior. The additional changes were due to the Ruff linting in pre-commit. But I believe these are expected to keep the coding style. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2025-06-06 14:32:35 +08:00
Adrian Altermatt	31d2b3cb5a	Fix: Grammar and clarity improvements in prompt templates (#8023 ) ## Summary Fixed grammar errors and improved clarity in prompt templates throughout `rag/prompts.py`. ## Changes Made - Fixed incomplete sentence: `"If the user's latest question is completely, don't do anything"` → `"If the user's latest question is already complete, don't do anything"` - Improved phrasing: `"of like [ID:i]"` → `"such as [ID:i]"` - Added missing articles: `"give top 3"` → `"give the top 3"` - Fixed prepositions: `"in language of"` → `"in the same language as"` - Corrected spelling: `"Jappanese"` → `"Japanese"` - Standardized formatting: Consistent role descriptions and punctuation ## Impact These changes improve prompt readability and should make instructions clearer for the underlying language models. ## Test Plan - [x] Verified changes maintain original prompt functionality - [x] No breaking changes to prompt structure or expected outputs Co-authored-by: Adrian Altermatt <adrian.altermatt@fgcz.uzh.ch>	2025-06-03 19:41:59 +08:00
Kevin Hu	156290f8d0	Fix: url path join issue. (#8013 ) ### What problem does this PR solve? Close #7980 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 14:18:40 +08:00
zstar	37998abef3	Update synonym dictionary file (#7997 ) ### What problem does this PR solve? Update the synonym dictionary file with relevant time and date to prevent synonyms from being mistakenly escaped. ### Type of change - [x] Refactoring	2025-06-03 09:41:53 +08:00
Kevin Hu	93f5df716f	Fix: order chunks from docx by positions. (#7979 ) ### What problem does this PR solve? #7934 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 17:20:53 +08:00
Yongteng Lei	bd4678bca6	Fix: Unnecessary truncation in markdown parser (#7972 ) ### What problem does this PR solve? Fix unnecessary truncation in markdown parser. So that markdown can work perfectly like [this](https://github.com/infiniflow/ragflow/issues/7824#issuecomment-2921312576) in #7824, supporting multiple special delimiters. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 15:04:21 +08:00
Yongteng Lei	46963ab1ca	Fix: add advanced delimiter detection for naive merge (#7941 ) ### What problem does this PR solve? Add advanced delimiter detection for naive merge. #7824 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 16:17:22 +08:00
giiiiiithub	6ba5a4348a	set PARALLEL_DEVICES default value= 0 (#7935 ) ### What problem does this PR solve? it would be fail if PARALLEL_DEVICES = None in OCR class , because it pass 0 to TextDetector and TextRecognizer init method. and It would be simpler to set 0 as the default value for PARALLEL_DEVICES. ### Type of change - [x] Refactoring	2025-05-29 13:32:16 +08:00
Yongteng Lei	0c562f0a9f	Refa: change citation mark as [ID:n] (#7923 ) ### What problem does this PR solve? Change citation mark as [ID:n], it's easier for LLMs to follow the instruction :) #7904 ### Type of change - [x] Refactoring	2025-05-29 10:03:51 +08:00
Stephen Hu	273f36cc54	Perf: reduce upload to minio limiter scope (#7878 ) ### What problem does this PR solve? reduce upload_to_minio limter scope ### Type of change - [x] Performance Improvement	2025-05-27 17:49:37 +08:00
Kevin Hu	28cb4df127	Fix: raptor overloading (#7889 ) ### What problem does this PR solve? #7840 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-27 17:41:35 +08:00
Kevin Hu	959793e83c	Fix: task limiter issue. (#7873 ) ### What problem does this PR solve? #7869 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-27 11:16:29 +08:00
pyyuhao	5d6bf2224a	Fix: Opensearch chunk management (#7802 ) ### What problem does this PR solve? This PR solve the problems metioned in the pr(https://github.com/infiniflow/ragflow/pull/7140) which is also submitted by me ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Introduction I fixed the problems when using OpenSearch as the DOC_ENGINE, the failures of pytest and the wrong API's return. Mainly about delete chunk, list chunks, update chunk, retrieval chunk. The pytest comand "cd sdk/python && uv sync --python 3.10 --group test --frozen && source .venv/bin/activate && cd test/test_http_api && DOC_ENGINE=opensearch pytest test_chunk_management_within_dataset -s --tb=short " is finally successful. ###Others As some changes between Elasticsearch And Opensearch differ, some pytest results about OpenSearch are correct and resonable. However, some pytest params (skipif params) are incompatible. So I changed some pytest params about skipif. As a search engine programmer, I will still focus on the usage of vector databases (especially OpenSearch) for the RAG stuff. Thanks for your review	2025-05-26 16:57:58 +08:00
Kevin Hu	be83074131	Fix: restore task limiter. (#7844 ) ### What problem does this PR solve? Close #7828 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-26 10:59:01 +08:00
Hao Zhang	2f4d803db1	Delete Corresponding Minio Bucket When Deleting a Knowledge Base (#7841 ) ### What problem does this PR solve? Delete Corresponding Minio Bucket When Deleting a Knowledge Base [issue #4113 ](https://github.com/infiniflow/ragflow/issues/4113) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-26 10:02:51 +08:00
Sol	0d7cfce6e1	Update rag/nlp/query.py (#7816 ) ### What problem does this PR solve? Fix tokenizer resulting in low recall ![37743d3a495f734aa69f1e173fa77457](https://github.com/user-attachments/assets/1394757e-8fcb-4f87-96af-a92716144884) ![4aba633a17f34269a4e17e84fafb34c4](https://github.com/user-attachments/assets/a1828e32-3e17-4394-a633-ba3f09bd506d) ![image](https://github.com/user-attachments/assets/61308f32-2a4f-44d5-a034-d65bbec554ef) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-23 17:13:37 +08:00
Stephen Hu	db4371c745	Fix: Improve First Chunk Size (#7806 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7790 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-23 14:30:19 +08:00
Emmanuel Ferdman	d4a123d6dd	Fix: resolve regex library warnings (#7782 ) ### What problem does this PR solve? This small PR resolves the regex library warnings showing in Python3.11: ```python DeprecationWarning: 'count' is passed as positional argument ``` ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-22 10:06:28 +08:00
Stephen Hu	ce816edb5f	Fix: improve task cancel lag (#7765 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7761 but it may be difficult to achieve 0 delay (which need to pass the cancel token to all parts) Another solution is just 0 delay effect at UI. And task will stop latter ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-22 09:28:08 +08:00
Stephen Hu	e3e7c7ddaa	Feat: delete useless image blobs when task executor meet edge cases (#7727 ) ### What problem does this PR solve? delete useless image blobs when the task executor meets edge cases ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-21 10:22:30 +08:00
S0b3Rr	5d21cc3660	fix: Fix the problem that concurrent execution limit in task executor fails and causes OOM (issue#7580) (#7700 ) ### What problem does this PR solve? ## Cause of the bug: During the execution process, due to improper use of trio CapacityLimiter, the configuration parameter MAX_CONCURRENT_TASKS is invalid, causing the executor to take out a large number of tasks from the Redis queue at one time. This behavior will cause the task executor to occupy too much memory and be killed by the OS when a large number of tasks exist at the same time. As a result, all executing tasks are suspended. ## Fix: Added the task_manager method to the entry of /rag/svr/task_executor.py to make CapacityLimiter effective. Deleted the invalid async with statement. ## Fix result: After testing, the task executor execution meets expectations, that is: concurrent execution of up to $MAX_CONCURRENT_TASKS tasks. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-19 10:25:56 +08:00
Song Fuchang	a1f06a4fdc	Feat: Support tool calling in Generate component (#7572 ) ### What problem does this PR solve? Hello, our use case requires LLM agent to invoke some tools, so I made a simple implementation here. This PR does two things: 1. A simple plugin mechanism based on `pluginlib`: This mechanism lives in the `plugin` directory. It will only load plugins from `plugin/embedded_plugins` for now. A sample plugin `bad_calculator.py` is placed in `plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`, then give a wrong result `a + b + 100`. In the future, it can load plugins from external location with little code change. Plugins are divided into different types. The only plugin type supported in this PR is `llm_tools`, which must implement the `LLMToolPlugin` class in the `plugin/llm_tool_plugin.py`. More plugin types can be added in the future. 2. A tool selector in the `Generate` component: Added a tool selector to select one or more tools for LLM: ![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc) And with the `bad_calculator` tool, it results this with the `qwen-max` model: ![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94) ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2025-05-16 16:32:19 +08:00
Kevin Hu	bfe97d896d	Fix: docx get image exception. (#7636 ) ### What problem does this PR solve? Close #7631 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-14 12:24:48 +08:00
Kevin Hu	01330fa428	Feat: let image citation being shown. (#7624 ) ### What problem does this PR solve? #7623 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-13 19:30:05 +08:00
Kevin Hu	321a280031	Feat: add image preview to retrieval test. (#7610 ) ### What problem does this PR solve? #7608 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-13 14:30:36 +08:00
Stephen Hu	573d46a4ef	FIX:ZeroDivisionError when using large page_size in client.retrieve() (#7595 ) ### What problem does this PR solve? Close #7592 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-13 10:46:31 +08:00
alkscr	4ae8f87754	Fix: missing graph resolution and community extraction in graphrag tasks (#7586 ) ### What problem does this PR solve? Info of whether applying graph resolution and community extraction is storage in `task["kb_parser_config"]`. However, previous code get `graphrag_conf` from `task["parser_config"]`, making `with_resolution` and `with_community` are always false. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-13 09:21:03 +08:00
alkscr	baa108f5cc	Fix: markdown table conversion error (#7570 ) ### What problem does this PR solve? Since `import markdown.markdown` has been changed to `import markdown` in `rag/app/naive.py`, previous code for converting markdown tables would call a markdown module instead of a callable function. This cause error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-12 17:16:55 +08:00
Kevin Hu	5b626870d0	Refa: remove ollama keep alive. (#7560 ) ### What problem does this PR solve? #7518 ### Type of change - [x] Refactoring	2025-05-09 17:51:49 +08:00
Kevin Hu	2ccec93d71	Feat: support cross-lang search. (#7557 ) ### What problem does this PR solve? #7376 #4503 #5710 #7470 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-09 15:32:02 +08:00
Kevin Hu	a14865e6bb	Fix: empty query issue. (#7551 ) ### What problem does this PR solve? #5214 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-09 12:20:19 +08:00
hfrt456	332e6ffbd4	Fix:local_es_tag (#7534 ) Two Case when local Es tag search has result which is filtered by score 1: Doc has empty tag,and not visi LLM 2: Code may use empty examples in Prompt for LLM search tag Co-authored-by: huangfuqunze <huangfuqunze.hfqz@alibaba-inc.com>	2025-05-09 10:17:24 +08:00
WhiteBear	5352bdf4da	Error storing tag in Redis (#7541 ) ### What problem does this PR solve? The parameter positions were incorrect and have been corrected to use keyword argument passing ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-09 10:17:09 +08:00
liuzhenghua	2f768b96e8	perf: optimze figure parser (#7392 ) ### What problem does this PR solve? When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes). By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level. ### Type of change - [x] Performance Improvement --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>	2025-05-06 14:39:45 +08:00
Stephen Hu	65537b8200	Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7420 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-06 14:38:36 +08:00
alkscr	ab27609a64	Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151 ) ### What problem does this PR solve? When you removed any document in a knowledge base using knowledge graph, the graph's `removed_kwd` is set to "Y". However, in the function `graphrag.utils.get_gaph`, `rebuild_graph` method is passed and directly return `None` while `removed_kwd=Y`, making residual part of the graph abandoned (but old entity data still exist in db). Besides, infinity instance actually pass deleting graph components' `source_id` when removing document. It may cause wrong graph after rebuild. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-30 09:43:17 +08:00
Stephen Hu	c88e4b3fc0	Fix: improve recover_pending_tasks timeout (#7408 ) ### What problem does this PR solve? Fix the redis lock will always timeout (change the logic order release lock first) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-29 16:50:39 +08:00
Kevin Hu	c7310f7fb2	Refa: similarity calculations. (#7381 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-04-28 19:17:11 +08:00
Stephen Hu	1a5608d0f8	Fix: Add title_tks for Pictures (#7365 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7362 append title_tks ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-28 13:35:34 +08:00
Neal Davis	23dcbc94ef	feat: replace models of novita (#7360 ) ### What problem does this PR solve? Replace models of novita ### Type of change - [x] Other (please describe): Replace models of novita	2025-04-28 13:35:09 +08:00
Stephen Hu	1662c7eda3	Feat: Markdown add image (#7124 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6984 1. Markdown parser supports get pictures 2. For Native, when handling Markdown, it will handle images 3. improve merge and ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-25 18:35:28 +08:00
Kevin Hu	b271cc34b3	Fix: LLM generated tag issue. (#7301 ) ### What problem does this PR solve? #7298 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 14:38:34 +08:00
Yongteng Lei	97a13ef1ab	Fix: Qwen-vl-plus url error (#7281 ) ### What problem does this PR solve? Fix Qwen-vl-* url error. #7277 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 09:20:10 +08:00
pyyuhao	c8c3b756b0	Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140 ) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2025-04-24 16:03:31 +08:00
benni82	216cd7474b	fix: task_executor bug fix (#7253 ) ### What problem does this PR solve? The lock is not released correctly when task_exectuor is abnormal ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-24 11:44:34 +08:00
WhiteBear	2c62652ea8	<think> tag is missing. (#7256 ) ### What problem does this PR solve? Some models force thinking, resulting in the absence of the think tag in the returned content ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-24 11:44:10 +08:00
Yongteng Lei	67dee2d74e	Fix: fix retrieval tesing wrong pagination (#7174 ) ### What problem does this PR solve? Fix retrieval testing wrong pagination. #7171 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-22 15:16:04 +08:00

... 5 6 7 8 9 ...

1004 Commits