ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-23 03:26:53 +08:00

Author	SHA1	Message	Date
aidan	420c97199a	Feat: Add TCADP parser for PPTX and spreadsheet document types. (#11041 ) ### What problem does this PR solve? - Added TCADP Parser configuration fields to PDF, PPT, and spreadsheet parsing forms - Implemented support for setting table result type (Markdown/HTML) and Markdown image response type (URL/Text) - Updated TCADP Parser to handle return format settings from configuration or parameters - Enhanced frontend to dynamically show TCADP options based on selected parsing method - Modified backend to pass format parameters when calling TCADP API - Optimized form default value logic for TCADP configuration items - Updated multilingual resource files for new configuration options ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-20 10:08:42 +08:00
Billy Bao	0884e9a4d9	Fix: bbox not included in mineru output (#11365 ) ### What problem does this PR solve? Fix: bbox not included in mineru output #11315 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-19 13:59:32 +08:00
Yongteng Lei	c2b7c305fa	Fix: crop index may out of range (#11341 ) ### What problem does this PR solve? Crop index may out of range. #11323 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-18 17:01:54 +08:00
Billy Bao	fea157ba08	Fix: manual parser with mineru (#11336 ) ### What problem does this PR solve? Fix: manual parser with mineru #11320 Fix: missing parameter in mineru #11334 Fix: add outlines parameter for pdf parsers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-18 15:22:52 +08:00
Billy Bao	e7e89d3ecb	Doc: style fix (#11295 ) ### What problem does this PR solve? Style fix based on #11283 ### Type of change - [x] Documentation Update	2025-11-17 11:16:34 +08:00
Stephen Hu	12db62b9c7	Refactor: improve mineru_parser get property logic (#11268 ) ### What problem does this PR solve? improve mineru_parser get property logic ### Type of change - [x] Refactoring	2025-11-14 16:32:35 +08:00
Kevin Hu	ba71160b14	Refa: rm useless code. (#11238 ) ### Type of change - [x] Refactoring	2025-11-13 09:59:55 +08:00
buua436	8ef2f79d0a	Fix:reset the agent component’s output (#11222 ) ### What problem does this PR solve? change: “After each dialogue turn, the agent component’s output is not reset.” ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-13 09:49:12 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Billy Bao	121c51661d	Fix: Markdown table extractor (#11018 ) ### What problem does this PR solve? Now markdown table extractor supports <table ...>. #10966 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-05 16:10:21 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Yongteng Lei	2677617f93	Feat: supports MinerU http-client/server method (#10961 ) ### What problem does this PR solve? Add support for MinerU http-client/server method. To use MinerU with vLLM server: 1. Set up a vLLM server running MinerU: ```bash mineru-vllm-server --port 30000 ``` 2. Configure the following environment variables: - `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru` (or the path to your MinerU executable) - `MINERU_BACKEND="vlm-http-client"` - `MINERU_SERVER_URL="http://your-vllm-server-ip:30000"` 3. Follow the standard MinerU setup steps as described above. With this configuration, RAGFlow will connect to your vLLM server to perform document parsing, which can significantly improve parsing performance for complex documents while reducing the resource requirements on your RAGFlow server. ![1](https://github.com/user-attachments/assets/46624a0c-0f3b-423e-ace8-81801e97a27d) ![2](https://github.com/user-attachments/assets/66ccc004-a598-47d4-93cb-fe176834f83b) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2025-11-04 16:03:30 +08:00
Jin Hai	1e45137284	Move 'timeout' to common folder (#10983 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:51:12 +08:00
Kevin Hu	3e5a39482e	Feat: Support multiple data sources synchronizations (#10954 ) ### What problem does this PR solve? #10953 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 19:59:18 +08:00
Jin Hai	1284647694	Refactor file utils (#10970 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 18:54:55 +08:00
Jin Hai	076d811086	Introduce common/config_utils.py (#10968 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 17:25:06 +08:00
Jin Hai	78631a3fd3	Move some functions out of 'api/utils/common.py' (#10948 ) ### What problem does this PR solve? as title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 12:34:47 +08:00
Jin Hai	360f5c1179	Move token related functions to common (#10942 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 08:50:05 +08:00
Jin Hai	44f2d6f5da	Move 'get_project_base_directory' to common directory (#10940 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 21:05:28 +08:00
Stephen Hu	09dd786674	Fix:KeyError: 'table_body' of mineru parser (#10773 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/10769 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-31 10:07:56 +08:00
buua436	bb9504d1cc	Fix:enhance delimiters in markdown parser (#10896 ) ### What problem does this PR solve? issue: [#10890](https://github.com/infiniflow/ragflow/issues/10890) change： enhance delimiters in markdown parser ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-30 17:36:51 +08:00
Edward Chen	b52f09adfe	Mineru api support (#10874 ) ### What problem does this PR solve? support local mineru api in docker instance. like no gpu in wsl on windows, but has mineru api with gpu support. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-10-30 17:31:46 +08:00
Billy Bao	057ae646f2	Fix: logging issues (#10836 ) ### What problem does this PR solve? Fix: logging issues #10835 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-28 14:10:47 +08:00
buua436	60a6cf7c7a	Fix:remove unexpected keyword argument in table_structure_recognizer logging (#10831 ) ### What problem does this PR solve? issue: #10825 change: remove unexpected keyword argument in table_structure_recognizer logging ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-28 11:02:43 +08:00
Billy Bao	e59458c36b	Fix: parsing excel with chartsheet & Clamp begin to a minimum of 0 to prevent negative indexing (#10819 ) ### What problem does this PR solve? Fix: parsing excel with chartsheet #10815 Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-28 09:40:37 +08:00
Yongteng Lei	5acc407240	Feat: MinerU supports VLM-Transfomers backend (#10809 ) ### What problem does this PR solve? MinerU supports VLM-Transfomers backend. Set `MINERU_BACKEND="pipeline"` to choose the backend. (Options: pipeline \| vlm-transformers, default is pipeline) ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-10-27 17:04:13 +08:00
aidan	33a189f620	Feat: add TCADP Parser (#10775 ) ### What problem does this PR solve? This PR adds a new TCADP (Tencent Cloud Advanced Document Processing) parser to RAGFlow, enabling users to leverage Tencent Cloud's document parsing capabilities for more accurate and structured document processing. The implementation includes: New TCADP Parser: A complete implementation of Tencent Cloud's document parsing API without SDK dependency Configuration Support: Added configuration options in service_conf.yaml for Tencent Cloud API credentials Frontend Integration: Updated UI components to support the new TCADP parser option Error Handling: Comprehensive error handling and retry mechanisms for API calls Result Processing: Support for both SSE streaming and JSON response formats from Tencent Cloud API ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-10-27 15:14:58 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
buua436	0ff2042fc1	Feat: add Docling parser (#10759 ) ### What problem does this PR solve? issue: #3945 change: add Docling parser ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-23 19:44:25 +08:00
buua436	41fade3fe6	Fix:wrong param in manual chunk (#10710 ) ### What problem does this PR solve? change: wrong param in manual chunk ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-21 20:10:54 +08:00
Stephen Hu	9d12380806	Fix: Excel2HTML can't support XLS（Excel 97-2003） (#10660 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/10602 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-21 09:52:59 +08:00
buua436	6ab96287c9	Feat:Vision Model Image Enhancement in Manual/Paper/Book/One chunker (#10640 ) ### What problem does this PR solve? issue: [#7472](https://github.com/infiniflow/ragflow/issues/7472) change: Vision Model Image Enhancement in Manual chunker ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-21 09:36:27 +08:00
Yongteng Lei	387baf858f	Feat: add MinerU parser (#10621 ) ### What problem does this PR solve? Add MinerU parser. #3945, #8092. Set `MINERU_EXECUTABLE` to the MinerU executable path, defaults to `mineru`. Set `MINERU_DELETE_OUTPUT=0` to preserve MinerU's output, default is 1, which deletes temporary output. Set `MINERU_OUTPUT_DIR` to choose the MinerU output directory (uses the temporary directory if unset). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-17 09:55:39 +08:00
Yongteng Lei	5200711441	Feat: add support for multi-column PDF parsing (#10475 ) ### What problem does this PR solve? Add support for multi-columns PDF parsing. #9878, #9919. Two-column sample: <img width="1885" height="1020" alt="image" src="https://github.com/user-attachments/assets/0270c028-2db8-4ca6-a4b7-cd5830882d28" /> Three-column sample: <img width="1881" height="992" alt="image" src="https://github.com/user-attachments/assets/9ee88844-d5b1-4927-9e4e-3bd810d6e03a" /> Single-column sample: <img width="1883" height="1042" alt="image" src="https://github.com/user-attachments/assets/e93d3d18-43c3-4067-b5fa-e454ed0ab093" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-10-11 18:46:09 +08:00
Kevin Hu	7d2f65671f	Feat: debugging toc part. (#10486 ) ### What problem does this PR solve? #10436 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-11 18:45:21 +08:00
Billy Bao	534fa60b2a	Fix: Agent.reset() argument wrong #10463 & Unable to converse with agent through Python API. #10415 (#10472 ) ### What problem does this PR solve? Fix: Agent.reset() argument wrong #10463 & Unable to converse with agent through Python API. #10415 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-10 20:44:05 +08:00
Kevin Hu	0d8791936e	Feat: TOC retrieval (#10456 ) ### What problem does this PR solve? #10436 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-10 17:07:55 +08:00
XIANG LI	f631073ac2	Fix OCR GPU provider mem limit handling (#10407 ) ### What problem does this PR solve? - Running DeepDoc OCR on large PDFs inside the GPU docker-compose setup would intermittently fail with [ONNXRuntimeError] ... p2o.Clip.6 ... Available memory of 0 is smaller than requested bytes ... - Root cause: load_model() in deepdoc/vision/ocr.py treated device_id=None as-is. torch.cuda.device_count() > device_id then raised a TypeError, the helper returned False, and ONNXRuntime quietly fell back to CPUExecutionProvider with the hard-coded 512 MB limit, which then triggered the allocator failure. - Environment where this reproduces: Windows 11, AMD 5900x, 64 GB RAM, RTX 3090 (24 GB), docker-compose-gpu.yml from upstream, default DeepDoc + GraphRAG parser settings, ingesting heavy PDF such as 《内科学》（第10版）.pdf (~180 MB). Fixes: - Normalize device_id to 0 when it is None before calling any CUDA APIs, so the GPU path is considered available. - Allow configuring the CUDA provider’s memory cap via OCR_GPU_MEM_LIMIT_MB (default 2048 MB) and expose OCR_ARENA_EXTEND_STRATEGY; the calculated byte limit is logged to confirm the effective settings. After the change, ragflow_server.log shows for example load_model ... uses GPU (device 0, gpu_mem_limit=21474836480, arena_strategy=kNextPowerOfTwo) and the same document finishes OCR without allocator errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-10 11:03:12 +08:00
Billy Bao	f04c9e2937	Fix: correctly update parser method & correct vllm pdf parser (#10441 ) ### What problem does this PR solve? Fix: correctly update parser method ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2025-10-09 19:03:12 +08:00
Kevin Hu	cbf04ee470	Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423 ) ### What problem does this PR solve? #9869 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: chanx <1243304602@qq.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com> Co-authored-by: Lynn <lynn_inf@hotmail.com> Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com> Co-authored-by: huangzl <huangzl@shinemo.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Wilmer <33392318@qq.com> Co-authored-by: Adrian Weidig <adrianweidig@gmx.net> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: Liu An <asiro@qq.com> Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com> Co-authored-by: BadwomanCraZY <511528396@qq.com> Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com> Co-authored-by: Russell Valentine <russ@coldstonelabs.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Billy Bao <newyorkupperbay@gmail.com> Co-authored-by: Zhedong Cen <cenzhedong2@126.com> Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com> Co-authored-by: TensorNull <tensor.null@gmail.com> Co-authored-by: TeslaZY <TeslaZY@outlook.com> Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com> Co-authored-by: AB <aj@Ajays-MacBook-Air.local> Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com> Co-authored-by: He Wang <wanghechn@qq.com> Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com> Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box> Co-authored-by: Stephen Hu <stephenhu@seismic.com> Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com> Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com> Co-authored-by: mxc <mxc@example.com> Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com> Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com> Co-authored-by: mcoder6425 <mcoder64@gmail.com> Co-authored-by: lemsn <lemsn@msn.com> Co-authored-by: lemsn <lemsn@126.com> Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com> Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com> Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>	2025-10-09 12:36:19 +08:00
Jin Hai	b0b866c8fd	Refactor: move some functions out of api/utils/__init__.py (#10216 ) ### What problem does this PR solve? Refactor import modules. ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-09-25 18:04:49 +08:00
Jin Hai	4eb7659499	Fix bug: broken import from rag.prompts.prompts (#10217 ) ### What problem does this PR solve? Fix broken imports ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2025-09-23 10:19:25 +08:00
Lynn	62d35b1b73	Fix: handle zero (#10149 ) ### What problem does this PR solve? Handle zero and nan in calculate. #10125 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-18 16:28:03 +08:00
Lynn	d353f7f7f8	Feat/parse audio (#10133 ) ### What problem does this PR solve? Dataflow support audio. And fix giteeAI's sequence2text model. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-09-18 09:31:32 +08:00
buua436	c9ea22ef69	Fix: set default chunk_token_num in html_parser (#10118 ) ### What problem does this PR solve? issue: [Bug]: Agent component (HTTP Request) "'>' not supported between instances of 'int' and 'NoneType'" [#10096](https://github.com/infiniflow/ragflow/issues/10096) Change: When the Invoke class instantiates HtmlParser without providing the chunk_token_num parameter, the value defaults to None, leading to a comparison error with block_token_count. This change sets the default chunk_token_num to 512 to prevent such errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: BadwomanCraZY <511528396@qq.com>	2025-09-17 09:36:31 +08:00
Yongteng Lei	86f6da2f74	Feat: add support for the Ascend table structure recognizer (#10110 ) ### What problem does this PR solve? Add support for the Ascend table structure recognizer. Use the environment variable `TABLE_STRUCTURE_RECOGNIZER_TYPE=ascend` to enable the Ascend table structure recognizer. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-16 13:57:06 +08:00
Yongteng Lei	bc0281040b	Feat: add support for the Ascend layout recognizer (#10105 ) ### What problem does this PR solve? Supports Ascend layout recognizer. Use the environment variable `LAYOUT_RECOGNIZER_TYPE=ascend` to enable the Ascend layout recognizer, and `ASCEND_LAYOUT_RECOGNIZER_DEVICE_ID=n` (for example, n=0) to specify the Ascend device ID. Ensure that you have installed the [ais tools](https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench) properly. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-16 09:51:15 +08:00
Lynn	341a7b1473	Fix: judge not empty before delete (#10099 ) ### What problem does this PR solve? judge not empty before delete session. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-15 17:49:52 +08:00
Lynn	2a88ce6be1	Fix: terminate onnx inference session manually (#10076 ) ### What problem does this PR solve? terminate onnx inference session and release memory manually. Issue #5050 Issue #9992 Issue #8805 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-12 17:18:26 +08:00
Yongteng Lei	0d9c1f1c3c	Feat: dataflow supports Spreadsheet and Word processor document (#9996 ) ### What problem does this PR solve? Dataflow supports Spreadsheet and Word processor document ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-10 13:02:53 +08:00

1 2 3 4 5

235 Commits