mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-01-31 15:45:08 +08:00
[Feat]Automatic table orientation detection and correction (#12719)
### What problem does this PR solve? This PR introduces automatic table orientation detection and correction within the PDF parser. This ensures that tables in PDFs are correctly oriented before structure recognition, improving overall parsing accuracy. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update
This commit is contained in:
@ -103,6 +103,31 @@ We use vision information to resolve problems as human being.
|
||||
<div align="center" style="margin-top:20px;margin-bottom:20px;">
|
||||
<img src="https://github.com/infiniflow/ragflow/assets/12318111/cb24e81b-f2ba-49f3-ac09-883d75606f4c" width="1000"/>
|
||||
</div>
|
||||
|
||||
- **Table Auto-Rotation**. For scanned PDFs where tables may be incorrectly oriented (rotated 90°, 180°, or 270°),
|
||||
the PDF parser automatically detects the best rotation angle using OCR confidence scores before performing
|
||||
table structure recognition. This significantly improves OCR accuracy and table structure detection for rotated tables.
|
||||
|
||||
The feature evaluates 4 rotation angles (0°, 90°, 180°, 270°) and selects the one with highest OCR confidence.
|
||||
After determining the best orientation, it re-performs OCR on the correctly rotated table image.
|
||||
|
||||
This feature is **enabled by default**. You can control it via environment variable:
|
||||
```bash
|
||||
# Disable table auto-rotation
|
||||
export TABLE_AUTO_ROTATE=false
|
||||
|
||||
# Enable table auto-rotation (default)
|
||||
export TABLE_AUTO_ROTATE=true
|
||||
```
|
||||
|
||||
Or via API parameter:
|
||||
```python
|
||||
from deepdoc.parser import PdfParser
|
||||
|
||||
parser = PdfParser()
|
||||
# Disable auto-rotation for this call
|
||||
boxes, tables = parser(pdf_path, auto_rotate_tables=False)
|
||||
```
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. Parser
|
||||
|
||||
Reference in New Issue
Block a user