[Feat]Automatic table orientation detection and correction (#12719)

### What problem does this PR solve?
This PR introduces automatic table orientation detection and correction
within the PDF parser. This ensures that tables in PDFs are correctly
oriented before structure recognition, improving overall parsing
accuracy.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
This commit is contained in:
zhanxin.xu
2026-01-22 12:47:55 +08:00
committed by GitHub
parent 2d9e7b4acd
commit 93091f4551
3 changed files with 330 additions and 22 deletions

View File

@ -103,6 +103,31 @@ We use vision information to resolve problems as human being.
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/cb24e81b-f2ba-49f3-ac09-883d75606f4c" width="1000"/>
</div>
- **Table Auto-Rotation**. For scanned PDFs where tables may be incorrectly oriented (rotated 90°, 180°, or 270°),
the PDF parser automatically detects the best rotation angle using OCR confidence scores before performing
table structure recognition. This significantly improves OCR accuracy and table structure detection for rotated tables.
The feature evaluates 4 rotation angles (0°, 90°, 180°, 270°) and selects the one with highest OCR confidence.
After determining the best orientation, it re-performs OCR on the correctly rotated table image.
This feature is **enabled by default**. You can control it via environment variable:
```bash
# Disable table auto-rotation
export TABLE_AUTO_ROTATE=false
# Enable table auto-rotation (default)
export TABLE_AUTO_ROTATE=true
```
Or via API parameter:
```python
from deepdoc.parser import PdfParser
parser = PdfParser()
# Disable auto-rotation for this call
boxes, tables = parser(pdf_path, auto_rotate_tables=False)
```
<a name="3"></a>
## 3. Parser