Commit Graph

35 Commits

Author SHA1 Message Date
d9fe279dde Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
dd0ebbea35 Light GraphRAG (#4585)
### What problem does this PR solve?

#4543

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-22 19:43:14 +08:00
3894de895b Update comments (#4569)
### What problem does this PR solve?

Add license statement.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-01-21 20:52:28 +08:00
0d68a6cd1b Fix errors detected by Ruff (#3918)
### What problem does this PR solve?

Fix errors detected by Ruff

### Type of change

- [x] Refactoring
2024-12-08 14:21:12 +08:00
e079656473 Update progress info and start welcome info (#3768)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-30 18:48:06 +08:00
30f6421760 Use consistent log file names, introduced initLogger (#3403)
### What problem does this PR solve?

Use consistent log file names, introduced initLogger

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-11-14 17:13:48 +08:00
a2a5631da4 Rework logging (#3358)
Unified all log files into one.

### What problem does this PR solve?

Unified all log files into one.

### Type of change

- [x] Refactoring
2024-11-12 17:35:13 +08:00
570ad420a8 remove unused import (#2679)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-09-30 16:59:39 +08:00
fc867cb959 rename get_txt to get_text (#2649)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-29 12:47:09 +08:00
aea553c3a8 Add get_txt function (#2639)
### What problem does this PR solve?

Add get_txt function to reduce duplicate code

### Type of change

- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-29 10:29:56 +08:00
6b3a40be5c Format file format from Windows/dos to Unix (#1949)
### What problem does this PR solve?

Related source file is in Windows/DOS format, they are format to Unix
format.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-08-15 09:17:36 +08:00
0171082cc5 fix create dialog bug (#982)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 09:25:05 +08:00
8dd45459be Add support for HTML file (#973)
### What problem does this PR solve?

Add support for HTML file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
7013d7f620 refine text decode (#657)
### What problem does this PR solve?
#651 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-07 12:25:47 +08:00
8c07992b6c refine code (#595)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-28 19:13:33 +08:00
f1c98aad6b Update version info (#564)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-26 20:07:26 +08:00
369400c483 fix bug of table in docx (#510)
### What problem does this PR solve?
#509 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 19:10:33 +08:00
72384b191d Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
0dfc8ddc0f enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
a38e163035 remove doc from supported processing types (#488)
### What problem does this PR solve?
#474 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:46:09 +08:00
ed6081845a Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?

#384

### Type of change

- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
f6c7204002 refine log format (#312)
### What problem does this PR solve?

Issue link:#264
### Type of change


- [x] Documentation Update
- [x] Refactoring
2024-04-11 10:13:43 +08:00
fd7fcb5baf apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
f6aee7f230 add use layout or not option (#145)
* add use layout or not option

* trival
2024-03-22 19:21:09 +08:00
602038ac49 fix task cancling bug (#98) 2024-03-05 16:33:47 +08:00
8a57f2afd5 change callback strategy, add timezone to docker (#96) 2024-03-05 12:08:41 +08:00
7bfaf0df29 fix position extraction bug (#93)
* fix position extraction bug

* remove delimiter for naive parser
2024-03-04 17:08:35 +08:00
685b4d8a95 fix table desc bugs, add positions to chunks (#91) 2024-03-04 14:42:26 +08:00
8a726fb04b solve task execution issues (#90) 2024-03-01 19:48:01 +08:00
7fd1eca582 init README of deepdoc, add picture processer. (#71)
* init README of deepdoc, add picture processer.

* add resume parsing
2024-02-23 18:28:12 +08:00
cacd36c5e1 use onnx models, new deepdoc (#68) 2024-02-21 16:32:38 +08:00
a8294f2168 Refine resume parts and fix bugs in retrival using sql (#66) 2024-02-19 19:22:17 +08:00
407b2523b6 remove unused codes, seperate layout detection out as a new api. Add new rag methed 'table' (#55) 2024-02-05 18:08:17 +08:00
51482f3e2a Some document API refined. (#53)
Add naive chunking method to RAG
2024-02-02 19:21:37 +08:00
e6acaf6738 Add Q&A and Book, fix task running bugs (#50) 2024-02-01 18:53:56 +08:00