### What problem does this PR solve?
issue:
#10825
change:
remove unexpected keyword argument in table_structure_recognizer logging
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
- Running DeepDoc OCR on large PDFs inside the GPU docker-compose setup
would intermittently fail with
[ONNXRuntimeError] ... p2o.Clip.6 ... Available memory of 0 is smaller
than requested bytes ...
- Root cause: load_model() in deepdoc/vision/ocr.py treated
device_id=None as-is.
torch.cuda.device_count() > device_id then raised a TypeError, the
helper returned False, and ONNXRuntime quietly fell back to
CPUExecutionProvider with
the hard-coded 512 MB limit, which then triggered the allocator failure.
- Environment where this reproduces: Windows 11, AMD 5900x, 64 GB RAM,
RTX 3090 (24 GB), docker-compose-gpu.yml from upstream, default DeepDoc
+ GraphRAG
parser settings, ingesting heavy PDF such as 《内科学》(第10版).pdf (~180 MB).
Fixes:
- Normalize device_id to 0 when it is None before calling any CUDA APIs,
so the GPU path is considered available.
- Allow configuring the CUDA provider’s memory cap via
OCR_GPU_MEM_LIMIT_MB (default 2048 MB) and expose
OCR_ARENA_EXTEND_STRATEGY; the calculated byte
limit is logged to confirm the effective settings.
After the change, ragflow_server.log shows for example
load_model ... uses GPU (device 0, gpu_mem_limit=21474836480,
arena_strategy=kNextPowerOfTwo) and the same document finishes OCR
without allocator errors.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Refactor import modules.
### Type of change
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add support for the Ascend table structure recognizer.
Use the environment variable `TABLE_STRUCTURE_RECOGNIZER_TYPE=ascend` to
enable the Ascend table structure recognizer.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Supports Ascend layout recognizer.
Use the environment variable `LAYOUT_RECOGNIZER_TYPE=ascend` to enable
the Ascend layout recognizer, and `ASCEND_LAYOUT_RECOGNIZER_DEVICE_ID=n`
(for example, n=0) to specify the Ascend device ID.
Ensure that you have installed the [ais
tools](https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench)
properly.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
judge not empty before delete session.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
terminate onnx inference session and release memory manually.
Issue #5050
Issue #9992
Issue #8805
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. rename var
2. update if statement
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Enhanced the image rotation handling by evaluating the original
orientation, clockwise 90°, and counter-clockwise 90° rotations. The
image with the highest text recognition score is now selected, improving
accuracy for text detection in images with aspect ratios >= 1.5.
#8166
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wenrui.cao <wenrui.cao@univers.com>
### What problem does this PR solve?
it would be fail if PARALLEL_DEVICES = None in OCR class , because it
pass 0 to TextDetector and TextRecognizer init method.
and It would be simpler to set 0 as the default value for
PARALLEL_DEVICES.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
For the create_inputs method based on np operation to replace for loop
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Optimize OCR garbage identification to reduce unnecessary filtering.
#5713
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The `ocr.res` file is already included in the model directory
`rag/res/deepdoc`, but it doesn't seem to be utilized here.
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
close#5277 by make sure the file close
### Type of change
- [x] Performance Improvement
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
This patch drop useless fastext which is seems useless in the code base
and its very kind of hard install
should close#4498
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Optimized Recognizer.sort_X_firstly and Recognizer.sort_Y_firstly
### Type of change
- [x] Performance Improvement
Use `np.float32()` instead.
### What problem does this PR solve?
Using `eval()` can lead to code injections.
I think `eval()` is only used to parse a floating point number here.
This change preserves the correct behavior if the string `"None"` is
supplied. But if that behavior isn't intended then this part could be
just deleted instead, since `np.float32()` is parsing strings anyway:
```Python
if isinstance(scale, str):
scale = eval(scale)
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixed GPU detection on CPU only environment. Close#4692
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Remove usage of `eval()` from postprocess.py
### What problem does this PR solve?
The use of `eval()` is a potential security risk. While the use of
`eval()` is guarded and thus not a security risk normally, `assert`s
aren't run if `-O` or `-OO` is passed to the interpreter, and as such
then the guard would not apply. In any case there is no reason to use
`eval()` here at all.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe):
Potential security fix if somehow the passed `modul_name` could be user
controlled.
`eval(op_type)` -> `getattr(operators, op_type)`
### What problem does this PR solve?
Using `eval()` can lead to code injections and is entirely unnecessary
here.
### Type of change
- [x] Other (please describe):
Best practice code improvement, preventing the possibility of code
injection.
`eval(op_name)` -> `getattr(operators, op_name)`
### What problem does this PR solve?
Using `eval()` can lead to code injections and is entirely unnecessary
here.
### Type of change
- [x] Other (please describe):
Best practice code improvement, preventing the possibility of code
injection.
### What problem does this PR solve?
Added static check at PR CI
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Edit chunk shall update instead of insert it. Close#3679
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)