ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-31 15:45:08 +08:00

Author	SHA1	Message	Date
HaiyangP	79399f7f25	Support the case of one cell split by multiple columns. (#9225 ) ### What problem does this PR solve? Support the case of one cell split by multiple columns. Besides, the codes are compatible with the common cell case. #8606 can be fixed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) I provide a case of one cell split by multiple columns: [test.xlsx](https://github.com/user-attachments/files/21578693/test.xlsx) The chunk res: <img width="236" height="57" alt="2025-06-17 16-04-07 的屏幕截图" src="https://github.com/user-attachments/assets/b0a499ac-349d-4c3d-8c6e-0931c8fc26de" />	2025-08-11 17:17:56 +08:00
Jay Xu	7f08ba47d7	Fix "no `tc` element at grid_offset" (#9375 ) ### What problem does this PR solve? fix "no `tc` element at grid_offset", just log warning and ignore. stacktrace: ``` Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 620, in handle_task await do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 553, in do_handle_task chunks = await build_chunks(task, progress_callback) File "/ragflow/rag/svr/task_executor.py", line 257, in build_chunks cks = await trio.to_thread.run_sync(lambda: chunker.chunk(task["name"], binary=binary, from_page=task["from_page"], File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync return msg_from_thread.unwrap() File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap raise captured_error File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result return result.unwrap() File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap raise captured_error File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn ret = context.run(sync_fn, *args) File "/ragflow/rag/svr/task_executor.py", line 257, in <lambda> cks = await trio.to_thread.run_sync(lambda: chunker.chunk(task["name"], binary=binary, from_page=task["from_page"], File "/ragflow/rag/app/naive.py", line 384, in chunk sections, tables = Docx()(filename, binary) File "/ragflow/rag/app/naive.py", line 230, in __call__ while i < len(r.cells): File "/ragflow/.venv/lib/python3.10/site-packages/docx/table.py", line 438, in cells return tuple(_iter_row_cells()) File "/ragflow/.venv/lib/python3.10/site-packages/docx/table.py", line 436, in _iter_row_cells yield from iter_tc_cells(tc) File "/ragflow/.venv/lib/python3.10/site-packages/docx/table.py", line 424, in iter_tc_cells yield from iter_tc_cells(tc._tc_above) # pyright: ignore[reportPrivateUsage] File "/ragflow/.venv/lib/python3.10/site-packages/docx/oxml/table.py", line 741, in _tc_above return self._tr_above.tc_at_grid_offset(self.grid_offset) File "/ragflow/.venv/lib/python3.10/site-packages/docx/oxml/table.py", line 98, in tc_at_grid_offset raise ValueError(f"no `tc` element at grid_offset={grid_offset}") ValueError: no `tc` element at grid_offset=10 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-11 17:13:10 +08:00
Jay Xu	ce3dd019c3	Fix broken data stream when writing image file (#9354 ) ### What problem does this PR solve? fix "broken data stream when writing image file", just log warning and ignore Close #8379 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-11 17:07:49 +08:00
TeslaZY	476c56868d	Agent plans tasks by referring to its own prompt. (#9315 ) ### What problem does this PR solve? Fixes the issue in the analyze_task execution flow where the Lead Agent was not utilizing its own sys_prompt during task analysis, resulting in incorrect or incomplete task planning. https://github.com/infiniflow/ragflow/issues/9294 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-11 17:05:06 +08:00
Stephen Hu	7713e14d6a	Update chat_model.py (#9318 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/9317 base on https://discuss.ai.google.dev/t/valueerror-invalid-operation-the-response-text-quick-accessor-requires-the-response-to-contain-a-valid-part-but-none-were-returned/42866 should can be handled by retry ### Type of change - [x] Refactoring	2025-08-08 14:13:07 +08:00
Kevin Hu	a2e1f5618d	Fix: bytes style image issue. (#9304 ) ### What problem does this PR solve? #9302 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-07 15:20:01 +08:00
Stephen Hu	0a0bfc02a0	Refactor:naive_merge_with_images close useless images (#9296 ) ### What problem does this PR solve? naive_merge_with_images close useless images ### Type of change - [x] Refactoring	2025-08-07 11:07:29 +08:00
He Wang	4fc9e42e74	fix: add missing env vars and default values of service_conf.yaml (#9289 ) ### What problem does this PR solve? Add missing env var `MYSQL_MAX_PACKET` to service_conf.yaml.template, and add default values to opendal config to fix npe. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-07 10:41:05 +08:00
so95	35539092d0	Add kwargs to model base class constructors (#9252 ) Updated constructors for base and derived classes in chat, embedding, rerank, sequence2txt, and tts models to accept kwargs. This change improves extensibility and allows passing additional parameters without breaking existing interfaces. - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: IT: Sop.Son <sop.son@feavn.local> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-07 09:45:37 +08:00
Kevin Hu	9ca86d801e	Refa: add provider info while adding model. (#9273 ) ### What problem does this PR solve? #9248 ### Type of change - [x] Refactoring	2025-08-07 09:40:42 +08:00
Stephen Hu	7efeaf6548	Fix:remove a img close which can not operate (#9267 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/9149#issuecomment-3157129587 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-06 10:59:49 +08:00
gooodboyAo	a7eba61067	FIX: If chunk["content_with_weight"] contains one or more unpaired surrogate characters (such as incomplete emoji or other special characters), then calling .encode("utf-8") directly will raise a UnicodeEncodeError. (#9246 ) FIX: If chunk["content_with_weight"] contains one or more unpaired surrogate characters (such as incomplete emoji or other special characters), then calling .encode("utf-8") directly will raise a UnicodeEncodeError. ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-06 10:36:50 +08:00
Kevin Hu	2124329e95	Fix: local variable issue. (#9255 ) ### What problem does this PR solve? #9227 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-05 19:24:34 +08:00
yzz	550e65bb22	Fix: PlainParser using fix in presentation (#9239 ) ### What problem does this PR solve? tiny fix about the using of `deepdoc.pdf_parser.PlainParser` in `rag.app.presentation.chunk`, I referred to other ways of using this class. So tiny the fix is, a issue seems unnecessary. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-05 17:48:18 +08:00
Stephen Hu	0a303d9ae1	Refactor:Improve the chat stream logic for NvidiaCV (#9242 ) ### What problem does this PR solve? Improve the chat stream logic for NvidiaCV ### Type of change - [x] Refactoring	2025-08-05 17:47:00 +08:00
Stephen Hu	1deb0a2d42	Fix:local variable 'response' referenced before assignment (#9230 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/9227 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-08-05 11:00:06 +08:00
Yongteng Lei	a249803961	Refa: ensure Redis stream queue could be created properly (#9223 ) ### What problem does this PR solve? Ensure Redis queue could be created properly. ### Type of change - [x] Refactoring	2025-08-05 09:54:31 +08:00
Kevin Hu	6ec3f18e22	Fix: self-deployed LLM error, (#9217 ) ### What problem does this PR solve? Close #9197 Close #9145 ### Type of change - [x] Refactoring - [x] Bug fixing.	2025-08-05 09:49:47 +08:00
Yongteng Lei	30ccc4a66c	Fix: correct single base64 image handling in image prompt (#9220 ) ### What problem does this PR solve? Correct single base64 image handling in image prompt. ![img_v3_02or_ec4757c2-a9d4-4774-9a76-f7c6be633ebg](https://github.com/user-attachments/assets/872a86bf-e2a8-48d1-9b71-2a0c7a35ba9e) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-05 09:26:42 +08:00
Jay Xu	cae11201ef	fix "out of memory" if slide.get_thumbnail() to a huge image (#9211 ) ### What problem does this PR solve? fix "out of memory" if slide.get_thumbnail() to a huge image ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-04 16:08:24 +08:00
Stephen Hu	667c5812d0	Fix:Repeated images when parsing markdown files with images (#9196 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/9149 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-04 13:35:58 +08:00
Stephen Hu	e9cbf4611d	Fix:Error when parsing files using Gemini: ERROR: GENERIC_ERROR - Unknown field for GenerationConfig: max_tokens (#9195 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/9177 The reason should be due to the gemin internal use a different parameter name ` max_output_tokens (int): Optional. The maximum number of tokens to include in a response candidate. Note: The default value varies by model, see the ``Model.output_token_limit`` attribute of the ``Model`` returned from the ``getModel`` function. This field is a member of `oneof`_ ``_max_output_tokens``. ` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-04 10:06:09 +08:00
Kevin Hu	a16cd4f110	Refa: add result to callback for agent tool use. (#9137 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-08-01 21:49:39 +08:00
Stephen Hu	5ccdb95008	Refactor:Introduce Image Close For GeminiCV (#9147 ) ### What problem does this PR solve? Introduce Image Close For GeminiCV ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-08-01 12:38:13 +08:00
JI4JUN	aeaeb169e4	Feat/support 302ai provider (#8742 ) ### What problem does this PR solve? Support 302.AI provider. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-31 14:48:30 +08:00
Stephen Hu	20b4d88098	Refactor: Improve the try catch logic for XinferenceEmbed (#9128 ) ### What problem does this PR solve? Improve the try catch logic for XinferenceEmbed ### Type of change - [x] Refactoring	2025-07-31 12:14:50 +08:00
Kevin Hu	d9fe279dde	Feat: Redesign and refactor agent module (#9113 ) ### What problem does this PR solve? #9082 #6365 <u> WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 19:41:09 +08:00
謝富祥	021e8b57ae	Fix: fix error 429 api rate limit when building knowledge graph for all chat model and Mistral embedding model (#9106 ) ### What problem does this PR solve? fix error 429 api rate limit when building knowledge graph for all chat model and Mistral embedding model. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-30 11:37:49 +08:00
Yongteng Lei	39ef2ffba9	Feat: parsing supports jsonl or ldjson format (#9087 ) ### What problem does this PR solve? Supports jsonl or ldjson format. Feature request from [discussion](https://github.com/orgs/infiniflow/discussions/8774). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 09:48:20 +08:00
Stephen Hu	ba563f8095	Update embedding_model.py (#9083 ) ### What problem does this PR solve? Reduce the logic scope for DefaultEmbedding ### Type of change - [x] Refactoring	2025-07-30 09:44:30 +08:00
Zhichang Yu	342a04ec8a	Added infinity rank_feature support (#9044 ) ### What problem does this PR solve? Added infinity rank_feature support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-29 09:14:23 +08:00
Stephen Hu	86b4da0844	Refactor: Remove Useless split for BedrockEmbed (#9067 ) ### What problem does this PR solve? Remove Useless split for BedrockEmbed ### Type of change - [x] Refactoring	2025-07-28 10:16:38 +08:00
Stephen Hu	53b0b0e583	get keep alive from env (#9039 ) ### What problem does this PR solve? get keepalive from env ### Type of change - [x] Refactoring	2025-07-25 12:16:33 +08:00
pyyuhao	49f3f26622	Bug fix: OpenSearch chunk update some api error (#9032 ) ### What problem does this PR solve? Fix a small non-blocking main workflow bug about chunk update When OpenSearch is the doc engine. When you wanna enable/disable a chunk in the web-page “Knowledge Base / Dataset / Chunk”, the bug ocurred. <img width="2388" height="662" alt="image" src="https://github.com/user-attachments/assets/575987a0-c929-4589-bfa0-ba54e137cfd9" /> The reaseon why it ocurred is that some api params between OpenSearch and ES differs. It functioned well no matter enable/disable/rewrite the chunk after I fixed. I also checked the result when using the chat web-page. <img width="2394" height="660" alt="image" src="https://github.com/user-attachments/assets/8b899dc6-d769-4e80-8dd8-ad0fbbca5f78" /> I will still focus on vector-database espeically OpenSearch. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: 张雨豪 <zhangyh80@chinatelecom.cn> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-07-25 09:57:24 +08:00
Viktor Dmitriyev	b47dcc9108	Fix issue with `keep_alive=-1` for ollama chat model by allowing a user to set an additional configuration option (#9017 ) ### What problem does this PR solve? fix issue with `keep_alive=-1` for ollama chat model by allowing a user to set an additional configuration option. It is no-breaking change because it still uses a previous default value such as: `keep_alive=-1` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [X] Performance Improvement - [X] Other (please describe): - Additional configuration option has been added to control behavior of RAGFlow while working with ollama LLM	2025-07-24 11:20:14 +08:00
Yongteng Lei	a2f73af1a4	Fix: typo Bearer token (#8998 ) ### What problem does this PR solve? Typo Bearer token. #8960 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-23 18:10:51 +08:00
Yongteng Lei	7ebc1f0943	Feat: add model provider DeepInfra (#9003 ) ### What problem does this PR solve? Add model provider DeepInfra. This model list comes from our community. NOTE: most endpoints haven't been tested, but they should work as OpenAI does. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-23 18:10:35 +08:00
Stephen Hu	ec21d9a98f	Refactor:remove use less convert for FastEmbed (#8984 ) ### What problem does this PR solve? remove use less convert for FastEmbed ### Type of change - [x] Refactoring	2025-07-23 10:51:48 +08:00
He Wang	175f5eaa90	use quote_plus to escape password in opendal's mysql url (#8976 ) ### What problem does this PR solve? Use `quote_plus` to escape password in opendal's mysql url to support special characters like `#`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-23 10:17:34 +08:00
Kevin Hu	935ce872d8	Refa: remove temperature since some LLMs fail to support. (#8981 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-07-23 10:17:04 +08:00
Stephen Hu	95b9208b13	Fix:Improve float operation when rerank (#8963 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8915 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-22 10:04:00 +08:00
Kevin Hu	c783d90ba3	Perf: set timeout for building chunks. (#8940 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-21 15:56:45 +08:00
Stephen Hu	46caf6ae72	Refactor improve codes for ranker (#8936 ) ### What problem does this PR solve? Use the normalize method directly ### Type of change - [x] Refactoring	2025-07-21 10:22:20 +08:00
Stephen Hu	92cfbcb382	Fix: when parse markdown support extract image at local (#8906 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8902 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-18 17:06:58 +08:00
Kevin Hu	ecdb1701df	Perf: test llm before RAPTOR. (#8897 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-17 16:48:50 +08:00
湛露先生	fd97ce3e5a	fix s3 init config . (#8886 ) ### What problem does this PR solve? when``` if 'signature_version' in self.s3_config:``` and ```if 'addressing_style' in self.s3_config:``` both true. the config init is error, will be overwrite by last one. this pr is for fix that case. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-07-17 12:10:15 +08:00
Stephen Hu	38b34116dd	Refa: Remove useless conver and fix a bug for DefaultRerank (#8887 ) ### What problem does this PR solve? 1. bug when re-try, we need to reset i. 2. remove useless convert ### Type of change - [x] Refactoring	2025-07-17 12:09:50 +08:00
Kevin Hu	fbd115773b	Perf: set timeout of some steps in KG. (#8873 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-16 18:06:03 +08:00
Liu An	9e45fcfdb3	Fix: fix typo in OpenAI error logging message (#8865 ) ### What problem does this PR solve? Correct the logging message from "OpenAI cat_with_tools" to "OpenAI chat_with_tools" in the `_exceptions` method of the `Base` class to accurately reflect the method name and improve error traceability. ### Type of change - [x] Typo	2025-07-16 15:31:57 +08:00
Yongteng Lei	e9b14142a5	Fix: fixed invalid save() arguments for slide thumbnails (#8851 ) ### What problem does this PR solve? Fixed invalid save() arguments for slide thumbnails. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-15 17:19:45 +08:00

1 2 3 4 5 ...

831 Commits