ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-27 21:56:35 +08:00

Author	SHA1	Message	Date
hy89	b0c21b00d9	Refactor: Optimize error handling and support parsing of XLS(EXCEL97—2003) files. (#5633 ) Optimize error handling and support parsing of XLS(EXCEL97—2003) files.	2025-03-05 11:55:27 +08:00
Zhichang Yu	c813c1ff4c	Made task_executor async to speedup parsing (#5530 ) ### What problem does this PR solve? Made task_executor async to speedup parsing ### Type of change - [x] Performance Improvement	2025-03-03 18:59:49 +08:00
yihong	8a2542157f	Fix: possible memory leaks close #5277 (#5500 ) ### What problem does this PR solve? close #5277 by make sure the file close ### Type of change - [x] Performance Improvement --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-03 10:26:45 +08:00
yihong	37aacb3960	Refa: drop useless fasttext (#5470 ) ### What problem does this PR solve? This patch drop useless fastext which is seems useless in the code base and its very kind of hard install should close #4498 ### Type of change - [x] Refactoring Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-02-28 14:30:56 +08:00
Yongteng Lei	83d0949498	Fix: fix special delimiter parsing issue (#5448 ) ### What problem does this PR solve? Fix special delimiter parsing issue #5382 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-27 18:33:55 +08:00
Zhichang Yu	db42d0e0ae	Optimize ocr (#5297 ) ### What problem does this PR solve? Introduced OCR.recognize_batch ### Type of change - [x] Performance Improvement	2025-02-24 16:21:55 +08:00
Zhichang Yu	0151d42156	Reuse loaded modules if possible (#5231 ) ### What problem does this PR solve? Reuse loaded modules if possible ### Type of change - [x] Refactoring	2025-02-21 17:21:01 +08:00
Zhichang Yu	c326f14fed	Optimized Recognizer.sort_X_firstly and Recognizer.sort_Y_firstly (#5182 ) ### What problem does this PR solve? Optimized Recognizer.sort_X_firstly and Recognizer.sort_Y_firstly ### Type of change - [x] Performance Improvement	2025-02-20 15:41:12 +08:00
Kevin Hu	b08bb56f6c	Display thinking for deepseek r1 (#4904 ) ### What problem does this PR solve? #4903 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-02-12 15:43:13 +08:00
Mathias Panzenböck	6b389e01b5	Remove use of eval() from operators.py (#4888 ) Use `np.float32()` instead. ### What problem does this PR solve? Using `eval()` can lead to code injections. I think `eval()` is only used to parse a floating point number here. This change preserves the correct behavior if the string `"None"` is supplied. But if that behavior isn't intended then this part could be just deleted instead, since `np.float32()` is parsing strings anyway: ```Python if isinstance(scale, str): scale = eval(scale) ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-12 12:53:42 +08:00
SkyfireWXY	8fcca1b958	fix: big xls file error (#4859 ) ### What problem does this PR solve? if *.xls file is too large, .eg >50M, I get error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-12 12:39:25 +08:00
Zhichang Yu	3411d0a2ce	Added cuda_is_available (#4725 ) ### What problem does this PR solve? Added cuda_is_available ### Type of change - [x] Refactoring	2025-02-05 18:01:23 +08:00
Zhichang Yu	e1526846da	Fixed GPU detection on CPU only environment (#4711 ) ### What problem does this PR solve? Fixed GPU detection on CPU only environment. Close #4692 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-05 12:02:43 +08:00
Kevin Hu	6f30397bb5	Infinity adapt to graphrag. (#4663 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-27 18:35:18 +08:00
Kevin Hu	1bff6b7333	Fix t_ocr.py for PNG image. (#4625 ) ### What problem does this PR solve? #4586 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-24 11:47:27 +08:00
Zhichang Yu	4230402fbb	deepdoc use GPU if possible (#4618 ) ### What problem does this PR solve? deepdoc use GPU if possible ### Type of change - [x] Refactoring	2025-01-24 09:48:02 +08:00
Mathias Panzenböck	1a367664f1	Remove usage of eval() from postprocess.py (#4571 ) Remove usage of `eval()` from postprocess.py ### What problem does this PR solve? The use of `eval()` is a potential security risk. While the use of `eval()` is guarded and thus not a security risk normally, `assert`s aren't run if `-O` or `-OO` is passed to the interpreter, and as such then the guard would not apply. In any case there is no reason to use `eval()` here at all. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Other (please describe): Potential security fix if somehow the passed `modul_name` could be user controlled.	2025-01-22 19:37:24 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Mathias Panzenböck	75e1981e13	Remove use of eval() from recognizer.py (#4480 ) `eval(op_type)` -> `getattr(operators, op_type)` ### What problem does this PR solve? Using `eval()` can lead to code injections and is entirely unnecessary here. ### Type of change - [x] Other (please describe): Best practice code improvement, preventing the possibility of code injection.	2025-01-20 09:52:47 +08:00
Mathias Panzenböck	4f9f9405b8	Remove use of eval() from ocr.py (#4481 ) `eval(op_name)` -> `getattr(operators, op_name)` ### What problem does this PR solve? Using `eval()` can lead to code injections and is entirely unnecessary here. ### Type of change - [x] Other (please describe): Best practice code improvement, preventing the possibility of code injection.	2025-01-20 09:52:30 +08:00
Kevin Hu	c852a6dfbf	Accelerate titles' embeddings. (#4492 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-01-15 15:20:29 +08:00
Kevin Hu	e478586a8e	Refactor. (#4487 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-01-15 14:06:46 +08:00
Zhi-Qiang You	b7ce4e7e62	fix:t_recognizer TypeError: 'super' object is not callable (#4404 ) ### What problem does this PR solve? [Bug]: layout recognizer failed for wrong boxes class type #4230 (https://github.com/infiniflow/ragflow/issues/4230) ### Type of change - [✅ ] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: youzhiqiang <zhiqiang.you@aminer.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-01-08 10:59:35 +08:00
Kevin Hu	2e40c2a6f6	Fix t_recognizer issue. (#4387 ) ### What problem does this PR solve? #4230 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-07 13:17:46 +08:00
Kevin Hu	983ec0666c	Fix param error. (#4355 ) ### What problem does this PR solve? #4230 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-06 13:54:17 +08:00
Kevin Hu	59a78408be	Fix t_recognizer.py after model updating. (#4330 ) ### What problem does this PR solve? #4230 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-02 17:00:11 +08:00
Kevin Hu	76cd23eecf	Catch the exception while parsing pptx. (#4202 ) ### What problem does this PR solve? #4189 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-24 10:49:28 +08:00
Kevin Hu	2cbe064080	Add Llama3.3 (#4174 ) ### What problem does this PR solve? #4168 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-23 11:18:01 +08:00
ly0303521	101b8ff813	fix chunk method "Table" losing content when the Excel file has multi… (#4123 ) …ple sheets ### What problem does this PR solve? discussed in https://github.com/infiniflow/ragflow/pull/4102 - In excel_parser.py, `total` means the total number of rows in Excel, but it return in the first iterate, that lead to the wrong `to_page` - In table.py, it when Excel file has multiple sheets, it will be divided into multiple parts, every part size is 3000, `data` may be empty, because it has recorded in the last iterate. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-19 17:30:26 +08:00
Kevin Hu	ce1e855328	Upgrades Document Layout Analysis model. (#4054 ) ### What problem does this PR solve? #4052 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 11:27:19 +08:00
Jin Hai	275b5d14f2	Fix json file parse (#4004 ) ### What problem does this PR solve? Fix json file parsing ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-12 20:34:46 +08:00
Zhichang Yu	9a6d976252	Add back beartype (#3967 ) ### What problem does this PR solve? Add back beartype ### Type of change - [x] Refactoring	2024-12-10 18:43:43 +08:00
Zhichang Yu	1254ecf445	Added static check at PR CI (#3921 ) ### What problem does this PR solve? Added static check at PR CI ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2024-12-08 21:23:51 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Jin Hai	821fdf02b4	Fix parsing JSON file error (#3829 ) ### What problem does this PR solve? Close issue: #3828 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-12-03 19:02:03 +08:00
Yuhao Tsui	7b6a5ffaff	Fix: page_chars attribute does not exist in some formats of PDF (#3796 ) ### What problem does this PR solve? In #3335 someone suggested to upgrade pdfplumber==0.11.1, but that didn't solve it. It's actually the special formatting in some of the pdfs that triggers the problem. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-03 11:08:06 +08:00
Kevin Hu	7058ac0041	Fix out of boundary. (#3786 ) ### What problem does this PR solve? #3769 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-02 11:38:53 +08:00
Zhichang Yu	bc701d7b4c	Edit chunk shall update instead of insert it (#3709 ) ### What problem does this PR solve? Edit chunk shall update instead of insert it. Close #3679 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:00:38 +08:00
Zhichang Yu	2249d5d413	Always open text file for write with UTF-8 (#3688 ) ### What problem does this PR solve? Always open text file for write with UTF-8. Close #932 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 16:24:16 +08:00
Zhichang Yu	cad341e794	Added kb_id filter to knn. Fix #3458 (#3513 ) ### What problem does this PR solve? Added kb_id filter to knn. Fix #3458 - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-20 20:53:30 +08:00
Zhichang Yu	4413683898	Introduced beartype (#3460 ) ### What problem does this PR solve? Introduced [beartype](https://github.com/beartype/beartype) for runtime type-checking. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-18 17:38:17 +08:00
Jin Hai	1e90a1bf36	Move settings initialization after module init phase (#3438 ) ### What problem does this PR solve? 1. Module init won't connect database any more. 2. Config in settings need to be used with settings.CONFIG_NAME ### Type of change - [x] Refactoring Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-15 17:30:56 +08:00
Zhichang Yu	30f6421760	Use consistent log file names, introduced initLogger (#3403 ) ### What problem does this PR solve? Use consistent log file names, introduced initLogger ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-11-14 17:13:48 +08:00
Kevin Hu	4caf932808	fix bug about fetching knowledge graph (#3394 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-14 12:29:15 +08:00
Zhichang Yu	a2a5631da4	Rework logging (#3358 ) Unified all log files into one. ### What problem does this PR solve? Unified all log files into one. ### Type of change - [x] Refactoring	2024-11-12 17:35:13 +08:00
kuschzzp	9c6cc20356	Fix:#3230 When parsing a docx file using the Book parsing method, to_page is always -1, resulting in a block count of 0 even if parsing is successful (#3249 ) ### What problem does this PR solve? When parsing a docx file using the Book parsing method, to_page is always -1, resulting in a block count of 0 even if parsing is successful Fix:#3230 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-11-08 09:21:42 +08:00
Kevin Hu	2d1fbefdb5	search between multiple indiices for team function (#3079 ) ### What problem does this PR solve? #2834 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-10-29 13:19:01 +08:00
Kevin Hu	bfc07fe4f9	bigger resolution for OCR (#2919 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2024-10-21 16:25:42 +08:00
chongchuanbing	66172cef3e	fix: torch dependency start error (#2777 ) ### What problem does this PR solve? when use slim image, remove ```torch``` denpendency. ### Type of change - [✓] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: chongchuanbing <chongchuanbing@gmail.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-10-10 10:06:03 +08:00
Ikko Eltociear Ashimine	c552a02e7f	chore: update operators.py (#2724 ) ### What problem does this PR solve? substract -> subtract ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-10-08 10:34:52 +08:00

1 2 3

135 Commits