Fix: split process bug in graphrag extract (#6423)

### What problem does this PR solve?

1. miss completion delimiter.
2. miss bracket process.
3. doc_ids return by update_graph is a set, and insert operation in
extract_community need a list.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
This commit is contained in:
utopia2077
2025-03-24 21:41:20 +08:00
committed by GitHub
parent a40c5aea83
commit 390086c6ab
2 changed files with 16 additions and 9 deletions

View File

@ -343,7 +343,7 @@ async def extract_community(
"entities_kwd": stru["entities"],
"important_kwd": stru["entities"],
"kb_id": kb_id,
"source_id": doc_ids,
"source_id": list(doc_ids),
"available_int": 0,
}
chunk["content_sm_ltks"] = rag_tokenizer.fine_grained_tokenize(