ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-08 20:42:30 +08:00

Author	SHA1	Message	Date
Kevin Hu	ba71160b14	Refa: rm useless code. (#11238 ) ### Type of change - [x] Refactoring	2025-11-13 09:59:55 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Jin Hai	78631a3fd3	Move some functions out of 'api/utils/common.py' (#10948 ) ### What problem does this PR solve? as title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 12:34:47 +08:00
Jin Hai	44f2d6f5da	Move 'get_project_base_directory' to common directory (#10940 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 21:05:28 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
XIANG LI	f631073ac2	Fix OCR GPU provider mem limit handling (#10407 ) ### What problem does this PR solve? - Running DeepDoc OCR on large PDFs inside the GPU docker-compose setup would intermittently fail with [ONNXRuntimeError] ... p2o.Clip.6 ... Available memory of 0 is smaller than requested bytes ... - Root cause: load_model() in deepdoc/vision/ocr.py treated device_id=None as-is. torch.cuda.device_count() > device_id then raised a TypeError, the helper returned False, and ONNXRuntime quietly fell back to CPUExecutionProvider with the hard-coded 512 MB limit, which then triggered the allocator failure. - Environment where this reproduces: Windows 11, AMD 5900x, 64 GB RAM, RTX 3090 (24 GB), docker-compose-gpu.yml from upstream, default DeepDoc + GraphRAG parser settings, ingesting heavy PDF such as 《内科学》（第10版）.pdf (~180 MB). Fixes: - Normalize device_id to 0 when it is None before calling any CUDA APIs, so the GPU path is considered available. - Allow configuring the CUDA provider’s memory cap via OCR_GPU_MEM_LIMIT_MB (default 2048 MB) and expose OCR_ARENA_EXTEND_STRATEGY; the calculated byte limit is logged to confirm the effective settings. After the change, ragflow_server.log shows for example load_model ... uses GPU (device 0, gpu_mem_limit=21474836480, arena_strategy=kNextPowerOfTwo) and the same document finishes OCR without allocator errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-10 11:03:12 +08:00
Jin Hai	b0b866c8fd	Refactor: move some functions out of api/utils/__init__.py (#10216 ) ### What problem does this PR solve? Refactor import modules. ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-09-25 18:04:49 +08:00
Lynn	341a7b1473	Fix: judge not empty before delete (#10099 ) ### What problem does this PR solve? judge not empty before delete session. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-15 17:49:52 +08:00
Lynn	2a88ce6be1	Fix: terminate onnx inference session manually (#10076 ) ### What problem does this PR solve? terminate onnx inference session and release memory manually. Issue #5050 Issue #9992 Issue #8805 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-12 17:18:26 +08:00
cwr31	e6d36f3a3a	Improve image rotation logic for text recognition (#8167 ) ### What problem does this PR solve? Enhanced the image rotation handling by evaluating the original orientation, clockwise 90°, and counter-clockwise 90° rotations. The image with the highest text recognition score is now selected, improving accuracy for text detection in images with aspect ratios >= 1.5. #8166 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: wenrui.cao <wenrui.cao@univers.com>	2025-06-11 09:20:30 +08:00
giiiiiithub	6ba5a4348a	set PARALLEL_DEVICES default value= 0 (#7935 ) ### What problem does this PR solve? it would be fail if PARALLEL_DEVICES = None in OCR class , because it pass 0 to TextDetector and TextRecognizer init method. and It would be simpler to set 0 as the default value for PARALLEL_DEVICES. ### Type of change - [x] Refactoring	2025-05-29 13:32:16 +08:00
Kevin Hu	3a99c2b5f4	Refa: PARALLEL_DEVICES is a static parameter. (#6168 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-17 16:49:54 +08:00
Debug Doctor	3e19044dee	Feat: add OCR's muti-gpus and parallel processing support (#5972 ) ### What problem does this PR solve? Add OCR's muti-gpus and parallel processing support ### Type of change - [x] New Feature (non-breaking change which adds functionality) @yuzhichang I've tried to resolve the comments in #5697. OCR jobs can now be done on both CPU and GPU. ( By the way, I've encountered a “Generate embedding error” issue #5954 that might be due to my outdated GPUs? idk. ) Please review it and give me suggestions. GPU: ![gpu_ocr](https://github.com/user-attachments/assets/0ee2ecfb-a665-4e50-8bc7-15941b9cd80e) ![smi](https://github.com/user-attachments/assets/a2312f8c-cf24-443d-bf89-bec50503546d) CPU: ![cpu_ocr](https://github.com/user-attachments/assets/1ba6bb0b-94df-41ea-be79-790096da4bf1)	2025-03-17 11:58:40 +08:00
yihong	4326873af6	refactor: no need to inherit in python3 clean the code (#5659 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-05 18:03:53 +08:00
Zhichang Yu	db42d0e0ae	Optimize ocr (#5297 ) ### What problem does this PR solve? Introduced OCR.recognize_batch ### Type of change - [x] Performance Improvement	2025-02-24 16:21:55 +08:00
Zhichang Yu	0151d42156	Reuse loaded modules if possible (#5231 ) ### What problem does this PR solve? Reuse loaded modules if possible ### Type of change - [x] Refactoring	2025-02-21 17:21:01 +08:00
Zhichang Yu	3411d0a2ce	Added cuda_is_available (#4725 ) ### What problem does this PR solve? Added cuda_is_available ### Type of change - [x] Refactoring	2025-02-05 18:01:23 +08:00
Zhichang Yu	e1526846da	Fixed GPU detection on CPU only environment (#4711 ) ### What problem does this PR solve? Fixed GPU detection on CPU only environment. Close #4692 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-05 12:02:43 +08:00
Zhichang Yu	4230402fbb	deepdoc use GPU if possible (#4618 ) ### What problem does this PR solve? deepdoc use GPU if possible ### Type of change - [x] Refactoring	2025-01-24 09:48:02 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Mathias Panzenböck	4f9f9405b8	Remove use of eval() from ocr.py (#4481 ) `eval(op_name)` -> `getattr(operators, op_name)` ### What problem does this PR solve? Using `eval()` can lead to code injections and is entirely unnecessary here. ### Type of change - [x] Other (please describe): Best practice code improvement, preventing the possibility of code injection.	2025-01-20 09:52:30 +08:00
Zhichang Yu	1254ecf445	Added static check at PR CI (#3921 ) ### What problem does this PR solve? Added static check at PR CI ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2024-12-08 21:23:51 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Kevin Hu	99adeabc85	remove dependency (#1536 ) ### What problem does this PR solve? #702 ### Type of change - [x] Refactoring	2024-07-16 16:30:17 +08:00
KevinHuSh	453c29170f	make sure the models will not be load twice (#422 ) ### What problem does this PR solve? #381 ### Type of change - [x] Refactoring	2024-04-18 09:37:23 +08:00
KevinHuSh	a5384446e3	let's load model from local (#163 )	2024-03-28 16:10:47 +08:00
KevinHuSh	979b3a5b4b	support snapshot download from local (#153 ) * support snapshot download from local * let snapshot download from local	2024-03-27 09:53:42 +08:00
KevinHuSh	da21320b88	fix plainPdf bugs (#152 )	2024-03-26 15:11:07 +08:00
KevinHuSh	9da671b951	refine manul parser (#131 )	2024-03-19 12:26:04 +08:00
KevinHuSh	675a9f8d9a	add dockerfile for cuda envirement. Refine table search strategy, (#123 )	2024-03-14 19:45:29 +08:00
KevinHuSh	8f86ab9f7f	refine pdf parser, add time zone to userinfo (#112 )	2024-03-08 11:24:24 +08:00
KevinHuSh	7fd1eca582	init README of deepdoc, add picture processer. (#71 ) * init README of deepdoc, add picture processer. * add resume parsing	2024-02-23 18:28:12 +08:00
KevinHuSh	d32322c081	rename vision, add layour and tsr recognizer (#70 ) * rename vision, add layour and tsr recognizer * trivial fixing	2024-02-22 19:11:37 +08:00

33 Commits