Commit Graph

28 Commits

Author SHA1 Message Date
56cd576876 Refa: revise the implementation of LightRAG and enable response caching (#9828)
### What problem does this PR solve?

This revision performed a comprehensive check on LightRAG to ensure the
correctness of its implementation. It **did not involve** Entity
Resolution and Community Reports Generation. There is an example using
default entity types and the General chunking method, which shows good
results in both time and effectiveness. Moreover, response caching is
enabled for resuming failed tasks.


[The-Necklace.pdf](https://github.com/user-attachments/files/22042432/The-Necklace.pdf)

After:


![img_v3_02pk_177dbc6a-e7cc-4732-b202-ad4682d171fg](https://github.com/user-attachments/assets/5ef1d93a-9109-4fe9-8a7b-a65add16f82b)


```bash
Begin at:
Fri, 29 Aug 2025 16:48:03 GMT
Duration:
222.31 s
Progress:
16:48:04 Task has been received.
16:48:06 Page(1~7): Start to parse.
16:48:06 Page(1~7): OCR started
16:48:08 Page(1~7): OCR finished (1.89s)
16:48:11 Page(1~7): Layout analysis (3.72s)
16:48:11 Page(1~7): Table analysis (0.00s)
16:48:11 Page(1~7): Text merged (0.00s)
16:48:11 Page(1~7): Finish parsing.
16:48:12 Page(1~7): Generate 7 chunks
16:48:12 Page(1~7): Embedding chunks (0.29s)
16:48:12 Page(1~7): Indexing done (0.04s). Task done (7.84s)
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin...
16:49:30 Completed processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je... after 1 gleanings, 21985 tokens.
16:49:30 Entities extraction of chunk 3 1/7 done, 12 nodes, 13 edges, 21985 tokens.
16:49:40 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Finally, she replied, hes... after 1 gleanings, 22584 tokens.
16:49:40 Entities extraction of chunk 5 2/7 done, 19 nodes, 19 edges, 22584 tokens.
16:50:02 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin... after 1 gleanings, 24610 tokens.
16:50:02 Entities extraction of chunk 0 3/7 done, 16 nodes, 28 edges, 24610 tokens.
16:50:03 Completed processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ... after 1 gleanings, 24031 tokens.
16:50:04 Entities extraction of chunk 1 4/7 done, 24 nodes, 22 edges, 24031 tokens.
16:50:14 Completed processing for f421fb06849e11f0bdd32724b93a52b2: So they begged the jewell... after 1 gleanings, 24635 tokens.
16:50:14 Entities extraction of chunk 6 5/7 done, 27 nodes, 26 edges, 24635 tokens.
16:50:29 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half... after 1 gleanings, 25758 tokens.
16:50:29 Entities extraction of chunk 2 6/7 done, 25 nodes, 35 edges, 25758 tokens.
16:51:35 Completed processing for f421fb06849e11f0bdd32724b93a52b2: The Necklace By Guy de Ma... after 1 gleanings, 27491 tokens.
16:51:35 Entities extraction of chunk 4 7/7 done, 39 nodes, 37 edges, 27491 tokens.
16:51:35 Entities and relationships extraction done, 147 nodes, 177 edges, 171094 tokens, 198.58s.
16:51:35 Entities merging done, 0.01s.
16:51:35 Relationships merging done, 0.01s.
16:51:35 ignored 7 relations due to missing entities.
16:51:35 generated subgraph for doc f421fb06849e11f0bdd32724b93a52b2 in 198.68 seconds.
16:51:35 run_graphrag f421fb06849e11f0bdd32724b93a52b2 graphrag_task_lock acquired
16:51:35 set_graph removed 0 nodes and 0 edges from index in 0.00s.
16:51:35 Get embedding of nodes: 9/147
16:51:35 Get embedding of nodes: 109/147
16:51:37 Get embedding of edges: 9/170
16:51:37 Get embedding of edges: 109/170
16:51:40 set_graph converted graph change to 319 chunks in 4.21s.
16:51:40 Insert chunks: 4/319
16:51:40 Insert chunks: 104/319
16:51:40 Insert chunks: 204/319
16:51:40 Insert chunks: 304/319
16:51:40 set_graph added/updated 147 nodes and 170 edges from index in 0.53s.
16:51:40 merging subgraph for doc f421fb06849e11f0bdd32724b93a52b2 into the global graph done in 4.79 seconds.
16:51:40 Knowledge Graph done (204.29s)
```

Before:


![img_v3_02pk_63370edf-ecee-4ee8-8ac8-69c8d2c712fg](https://github.com/user-attachments/assets/1162eb0f-68c2-4de5-abe0-cdfa168f71de)

```bash
Begin at:
Fri, 29 Aug 2025 17:00:47 GMT
processDuration:
173.38 s
Progress:
17:00:49 Task has been received.
17:00:51 Page(1~7): Start to parse.
17:00:51 Page(1~7): OCR started
17:00:53 Page(1~7): OCR finished (1.82s)
17:00:57 Page(1~7): Layout analysis (3.64s)
17:00:57 Page(1~7): Table analysis (0.00s)
17:00:57 Page(1~7): Text merged (0.00s)
17:00:57 Page(1~7): Finish parsing.
17:00:57 Page(1~7): Generate 7 chunks
17:00:57 Page(1~7): Embedding chunks (0.31s)
17:00:57 Page(1~7): Indexing done (0.03s). Task done (7.88s)
17:00:57 created task graphrag
17:01:00 Task has been received.
17:02:17 Entities extraction of chunk 1 1/7 done, 9 nodes, 9 edges, 10654 tokens.
17:02:31 Entities extraction of chunk 2 2/7 done, 12 nodes, 13 edges, 11066 tokens.
17:02:33 Entities extraction of chunk 4 3/7 done, 9 nodes, 10 edges, 10433 tokens.
17:02:42 Entities extraction of chunk 5 4/7 done, 11 nodes, 14 edges, 11290 tokens.
17:02:52 Entities extraction of chunk 6 5/7 done, 13 nodes, 15 edges, 11039 tokens.
17:02:55 Entities extraction of chunk 3 6/7 done, 14 nodes, 13 edges, 11466 tokens.
17:03:32 Entities extraction of chunk 0 7/7 done, 19 nodes, 18 edges, 13107 tokens.
17:03:32 Entities and relationships extraction done, 71 nodes, 89 edges, 79055 tokens, 149.66s.
17:03:32 Entities merging done, 0.01s.
17:03:32 Relationships merging done, 0.01s.
17:03:32 ignored 1 relations due to missing entities.
17:03:32 generated subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 in 149.69 seconds.
17:03:32 run_graphrag b1d9d3b6848711f0aacd7ddc0714c4d3 graphrag_task_lock acquired
17:03:32 set_graph removed 0 nodes and 0 edges from index in 0.00s.
17:03:32 Get embedding of nodes: 9/71
17:03:33 Get embedding of edges: 9/88
17:03:34 set_graph converted graph change to 161 chunks in 2.27s.
17:03:34 Insert chunks: 4/161
17:03:34 Insert chunks: 104/161
17:03:34 set_graph added/updated 71 nodes and 88 edges from index in 0.28s.
17:03:34 merging subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 into the global graph done in 2.60 seconds.
17:03:34 Knowledge Graph done (153.18s)

```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
- [x] Performance Improvement
2025-08-29 17:58:36 +08:00
8d8a5f73b6 Fix: meta data filter with AND logic operations. (#9687)
### What problem does this PR solve?

Close #9648

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-25 18:29:24 +08:00
2f74727bb9 Fix: meta data error. (#9670)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-25 09:41:52 +08:00
ca720bd811 Fix: save team's canvas issue. (#9518)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-18 13:05:29 +08:00
2114e966d8 Feat: add citation option to agent and enlarge the timeouts. (#9484)
### What problem does this PR solve?

#9422

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-15 10:05:01 +08:00
c783d90ba3 Perf: set timeout for building chunks. (#8940)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-21 15:56:45 +08:00
ecdb1701df Perf: test llm before RAPTOR. (#8897)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-17 16:48:50 +08:00
606bf20a3f Fix: parameter missing. (#8895)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-17 16:06:22 +08:00
729e6098f9 Refa: add more logs to KG. (#8889)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-07-17 14:43:08 +08:00
fbd115773b Perf: set timeout of some steps in KG. (#8873)
### What problem does this PR solve?

### Type of change


- [x] Performance Improvement
2025-07-16 18:06:03 +08:00
c642dbefca Perf: Enhance timeout handling. (#8826)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-15 09:36:45 +08:00
e3edcc3064 Trivals. (#8597)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:05:18 +08:00
2337bbf6ca Perf: pass useless check for tidy graph (#8121)
### What problem does this PR solve?
Support passing the attribute check when the upstream has already made
sure it.

### Type of change
- [X] Performance Improvement
2025-06-09 11:44:13 +08:00
a71376ad6a Fix: KeyError: 'method' when build run_graphrag (#7899)
### What problem does this PR solve?
Close #7879
I checked the current master code, the kb_parser_config is join from
knowledge table, so I think should be some edge cases due to history
data

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-28 11:46:41 +08:00
4ae8f87754 Fix: missing graph resolution and community extraction in graphrag tasks (#7586)
### What problem does this PR solve?

Info of whether applying graph resolution and community extraction is
storage in `task["kb_parser_config"]`. However, previous code get
`graphrag_conf` from `task["parser_config"]`, making `with_resolution`
and `with_community` are always false.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-05-13 09:21:03 +08:00
ab27609a64 Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151)
### What problem does this PR solve?

When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).

Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 09:43:17 +08:00
b1798bafb0 Fix: handle sometimes graph index will miss explanation (#7127)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7053

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 14:24:36 +08:00
d64c6870bb Fix:When parsing documents with graph, an error occurred:[ERROR][Exception]: 'method' (#6836)
[When parsing documents with graph, an error
occurred:[ERROR][Exception]: 'method']
(https://github.com/infiniflow/ragflow/issues/6835)
### What problem does this PR solve?

Close #6786

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: cm <caiming@sict.ac.cn>
2025-04-07 12:29:25 +08:00
fdc410e743 Fix set_graph on non-existing edge (#6777)
### What problem does this PR solve?

Fix set_graph on non-existing edge

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-03 11:09:04 +08:00
e7a2a4b7ff Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
36b62e0fab EntityResolution batch. Close #6570 (#6602)
### What problem does this PR solve?

EntityResolution batch

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 16:40:36 +08:00
c4998d0e09 Rename graphrag task lock (#6576)
### What problem does this PR solve?

Rename graphrag task lock

### Type of change

- [x] Refactoring
2025-03-26 23:48:47 +08:00
6bf26e2a81 Optimize graphrag again (#6513)
### What problem does this PR solve?

Removed set_entity and set_relation to avoid accessing doc engine during
graph computation.
Introduced GraphChange to avoid writing unchanged chunks.

### Type of change

- [x] Performance Improvement
2025-03-26 15:34:42 +08:00
390086c6ab Fix: split process bug in graphrag extract (#6423)
### What problem does this PR solve?

1. miss completion delimiter.
2. miss bracket process.
3. doc_ids return by update_graph is a set, and insert operation in
extract_community need a list.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 21:41:20 +08:00
939e668096 Optimized graphrag again (#5927)
### What problem does this PR solve?

Optimized graphrag again

### Type of change

- [x] Performance Improvement
2025-03-11 18:36:10 +08:00
6ec6ca6971 Refactor graphrag to remove redis lock (#5828)
### What problem does this PR solve?

Refactor graphrag to remove redis lock

### Type of change

- [x] Refactoring
2025-03-10 15:15:06 +08:00
c813c1ff4c Made task_executor async to speedup parsing (#5530)
### What problem does this PR solve?

Made task_executor async to speedup parsing

### Type of change

- [x] Performance Improvement
2025-03-03 18:59:49 +08:00
dd0ebbea35 Light GraphRAG (#4585)
### What problem does this PR solve?

#4543

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-22 19:43:14 +08:00