Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)

### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
This commit is contained in:
Yongteng Lei
2025-06-12 19:09:50 +08:00
committed by GitHub
parent d36c8d18b1
commit 24ca4cc6b7
3 changed files with 90 additions and 16 deletions

View File

@ -21,6 +21,8 @@ import sys
import threading
import time
from valkey import RedisError
from api.utils.log_utils import initRootLogger, get_project_base_directory
from graphrag.general.index import run_graphrag
from graphrag.utils import get_llm_cache, set_llm_cache, get_tags_from_cache, set_tags_to_cache
@ -187,18 +189,44 @@ async def collect():
global CONSUMER_NAME, DONE_TASKS, FAILED_TASKS
global UNACKED_ITERATOR
svr_queue_names = get_svr_queue_names()
redis_msg = None
try:
if not UNACKED_ITERATOR:
UNACKED_ITERATOR = REDIS_CONN.get_unacked_iterator(svr_queue_names, SVR_CONSUMER_GROUP_NAME, CONSUMER_NAME)
try:
redis_msg = next(UNACKED_ITERATOR)
except StopIteration:
UNACKED_ITERATOR = None
logging.debug("Rebuilding UNACKED_ITERATOR due to it is None")
try:
UNACKED_ITERATOR = REDIS_CONN.get_unacked_iterator(svr_queue_names, SVR_CONSUMER_GROUP_NAME, CONSUMER_NAME)
logging.debug("UNACKED_ITERATOR rebuilt successfully")
except RedisError as e:
UNACKED_ITERATOR = None
logging.warning(f"Failed to rebuild UNACKED_ITERATOR: {e}")
if UNACKED_ITERATOR:
try:
redis_msg = next(UNACKED_ITERATOR)
except StopIteration:
UNACKED_ITERATOR = None
logging.debug("UNACKED_ITERATOR exhausted, clearing")
except Exception as e:
UNACKED_ITERATOR = None
logging.warning(f"UNACKED_ITERATOR raised exception: {e}")
if not redis_msg:
for svr_queue_name in svr_queue_names:
redis_msg = REDIS_CONN.queue_consumer(svr_queue_name, SVR_CONSUMER_GROUP_NAME, CONSUMER_NAME)
if redis_msg:
break
except Exception:
logging.exception("collect got exception")
try:
redis_msg = REDIS_CONN.queue_consumer(svr_queue_name, SVR_CONSUMER_GROUP_NAME, CONSUMER_NAME)
if redis_msg:
break
except RedisError as e:
logging.warning(f"queue_consumer failed for {svr_queue_name}: {e}")
continue
except Exception as e:
logging.exception(f"collect task encountered unexpected exception: {e}")
UNACKED_ITERATOR = None
await trio.sleep(1)
return None, None
if not redis_msg: