ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-29 16:05:35 +08:00

Files

Yongteng Lei 51bc41b2e8 Refa: improve image table context (#12244 )

### What problem does this PR solve?

Improve image table context.

Current strategy in attach_media_context:

- Order by position when possible: if any chunk has page/position info,
sort by (page, top, left), otherwise keep original order.
- Apply only to media chunks: images use image_context_size, tables use
table_context_size.
- Primary matching: on the same page, choose a text chunk whose vertical
span overlaps the media, then pick the one with the closest vertical
midpoint.
- Fallback matching: if no overlap on that page, choose the nearest text
chunk on the same page (page-head uses the next text; page-tail uses the
previous text).
- Context extraction: inside the chosen text chunk, find a mid-sentence
boundary near the text midpoint, then take context_size tokens split
before/after (total budget).
- No multi-chunk stitching: context comes from a single text chunk to
avoid mixing unrelated segments.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>

2025-12-26 17:55:32 +08:00

__init__.py

Refa: improve image table context (#12244 )

2025-12-26 17:55:32 +08:00

query.py

Feat: message manage (#12196 )

2025-12-25 21:18:13 +08:00

rag_tokenizer.py

Fix: tokenizer issue. (#11902 )

2025-12-11 17:38:17 +08:00

search.py

Feat: message manage (#12196 )