### What problem does this PR solve?
This PR addresses critical memory and CPU resource management issues in
high-concurrency environments (multi-worker setups):
GPU Memory Exhaustion (OOM): Currently, onnxruntime-gpu uses an
aggressive memory arena that does not effectively release VRAM back to
the system after a task completes. In multi-process worker setups ($WS >
4), this leads to BFCArena allocation failures and OOM errors as workers
"hoard" VRAM even when idle. This PR introduces an optional GPU Memory
Arena Shrinkage toggle to mitigate this issue.
CPU Oversubscription: ONNX intra_op and inter_op thread counts are
currently hardcoded to 2. When running many workers, this causes
significant CPU context-switching overhead and degrades performance.
This PR makes these values configurable to match the host's actual CPU
core density.
Multi-GPU Support: The memory management logic has been improved to
dynamically target the correct device_id, ensuring stability on systems
with multiple GPUs.
Transparency: Added detailed initialization logs to help administrators
verify and troubleshoot their ONNX session configurations.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: shakeel <shakeel@lollylaw.com>
### What problem does this PR solve?
- typos
- IDE warnings
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
- Running DeepDoc OCR on large PDFs inside the GPU docker-compose setup
would intermittently fail with
[ONNXRuntimeError] ... p2o.Clip.6 ... Available memory of 0 is smaller
than requested bytes ...
- Root cause: load_model() in deepdoc/vision/ocr.py treated
device_id=None as-is.
torch.cuda.device_count() > device_id then raised a TypeError, the
helper returned False, and ONNXRuntime quietly fell back to
CPUExecutionProvider with
the hard-coded 512 MB limit, which then triggered the allocator failure.
- Environment where this reproduces: Windows 11, AMD 5900x, 64 GB RAM,
RTX 3090 (24 GB), docker-compose-gpu.yml from upstream, default DeepDoc
+ GraphRAG
parser settings, ingesting heavy PDF such as 《内科学》(第10版).pdf (~180 MB).
Fixes:
- Normalize device_id to 0 when it is None before calling any CUDA APIs,
so the GPU path is considered available.
- Allow configuring the CUDA provider’s memory cap via
OCR_GPU_MEM_LIMIT_MB (default 2048 MB) and expose
OCR_ARENA_EXTEND_STRATEGY; the calculated byte
limit is logged to confirm the effective settings.
After the change, ragflow_server.log shows for example
load_model ... uses GPU (device 0, gpu_mem_limit=21474836480,
arena_strategy=kNextPowerOfTwo) and the same document finishes OCR
without allocator errors.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Refactor import modules.
### Type of change
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
judge not empty before delete session.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
terminate onnx inference session and release memory manually.
Issue #5050
Issue #9992
Issue #8805
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Enhanced the image rotation handling by evaluating the original
orientation, clockwise 90°, and counter-clockwise 90° rotations. The
image with the highest text recognition score is now selected, improving
accuracy for text detection in images with aspect ratios >= 1.5.
#8166
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wenrui.cao <wenrui.cao@univers.com>
### What problem does this PR solve?
it would be fail if PARALLEL_DEVICES = None in OCR class , because it
pass 0 to TextDetector and TextRecognizer init method.
and It would be simpler to set 0 as the default value for
PARALLEL_DEVICES.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fixed GPU detection on CPU only environment. Close#4692
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
`eval(op_name)` -> `getattr(operators, op_name)`
### What problem does this PR solve?
Using `eval()` can lead to code injections and is entirely unnecessary
here.
### Type of change
- [x] Other (please describe):
Best practice code improvement, preventing the possibility of code
injection.
### What problem does this PR solve?
Added static check at PR CI
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring