Commit Graph

91 Commits

Author SHA1 Message Date
fe797bcc66 be better chunks before graphrag (#1811)
### What problem does this PR solve?

#1594

### Type of change

- [x] Refactoring
2024-08-05 16:21:52 +08:00
a5c03ccd4c refine mindmap prompt (#1808)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 15:33:44 +08:00
H
d2213141e0 Fix graphrag callback (#1806)
### What problem does this PR solve?

#1800 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 14:44:54 +08:00
152072f900 Add graphrag (#1793)
### What problem does this PR solve?

#1594

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 18:51:14 +08:00
a973b9e01f Fix: Embedding err when docx contains unsupported images (#1720)
### What problem does this PR solve?

Fix the problem of not being able to embedding when docx document
contains unsupported images.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 19:38:47 +08:00
H
0cb588f7bf Fix docx parser line bug (#1715)
### What problem does this PR solve?
#1704 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 10:06:02 +08:00
cHz
4b195cc14c fix: Misspelled Variable Name (#1662)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 11:14:46 +08:00
H
ac7a0d4fbf Add ParsertType Audio (#1637)
### What problem does this PR solve?

#1514 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-22 19:17:30 +08:00
92e9320657 upgrade laws parser of docx (#1332)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-07-01 15:50:24 +08:00
fc7cc1d36c Optimize docx handle method in laws parser (#1302)
### What problem does this PR solve?

Optimize docx handle method in laws parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 17:42:59 +08:00
a95c1d45f0 Support table for markdown file in general parser (#1278)
### What problem does this PR solve?

Support extracting table for markdown file in general parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 14:38:35 +08:00
b75bb1d8d3 Support displaying tables in the chunks of pdf file when using QA parser (#1263)
### What problem does this PR solve?

Support displaying tables in the chunks of pdf file when using QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 19:02:18 +08:00
38bd02f402 Support displaying images in the chunks of docx files when using general parser (#1253)
### What problem does this PR solve?

Support displaying images in chunks of docx files when using general
parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 16:29:36 +08:00
f8fe4154e8 Place pdf's image at the correct position in QA parser (#1235)
### What problem does this PR solve?

Place pdf's image at the correct position in QA parser

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-24 10:41:03 +08:00
15bf9f8c25 refine code to prevent exception (#1231)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-06-21 14:06:46 +08:00
18f4a6b35c feat: support json file (#1217)
### What problem does this PR solve?

feat: support json file.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-06-21 10:42:29 +08:00
3c1444ab19 Add docx support for manual parser (#1227)
### What problem does this PR solve?

Add docx support for manual parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-20 17:03:02 +08:00
fb56a29478 Add docx support for QA parser (#1213)
### What problem does this PR solve?

Add docx support for QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-20 16:09:09 +08:00
e35f7610e7 fix too long query exception (#1195)
### What problem does this PR solve?

#1161 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-18 09:50:59 +08:00
7920a5c78d Add markdown support for QA parser (#1180)
### What problem does this PR solve?

Add markdown support for QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-18 09:45:13 +08:00
90975460af Add pdf support for QA parser (#1155)
### What problem does this PR solve?

Support extracting questions and answers from PDF files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-14 15:12:39 +08:00
2023fdc13e fix file preview in file management (#1151)
### What problem does this PR solve?

fix file preview in file management

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-14 10:33:59 +08:00
9ed0e50f6b Update info (#1005)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-31 09:53:04 +08:00
0171082cc5 fix create dialog bug (#982)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 09:25:05 +08:00
8dd45459be Add support for HTML file (#973)
### What problem does this PR solve?

Add support for HTML file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
7eee193956 fix #917 #915 (#946)
### What problem does this PR solve?

#917 
#915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
a12fcf9156 fix minio helth bug (#850)
### What problem does this PR solve?

#643 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-20 19:35:30 +08:00
GYH
c27c02ea67 Split Excel file into different chunks (#847)
### What problem does this PR solve?


Split Excel into different chunk
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-20 18:35:15 +08:00
6ff63ee2ba Support for code files parse (#789)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 16:34:28 +08:00
7013d7f620 refine text decode (#657)
### What problem does this PR solve?
#651 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-07 12:25:47 +08:00
674b3aeafd fix disable and enable llm setting in dialog (#616)
### What problem does this PR solve?
#614 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-30 11:04:14 +08:00
8acc01a227 refine redis connection (#599)
### What problem does this PR solve?

#591 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-29 08:52:38 +08:00
8c07992b6c refine code (#595)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-28 19:13:33 +08:00
f1c98aad6b Update version info (#564)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-26 20:07:26 +08:00
369400c483 fix bug of table in docx (#510)
### What problem does this PR solve?
#509 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 19:10:33 +08:00
aa71462a9f fix bug #502 (#504)
### What problem does this PR solve?

#502 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 16:01:46 +08:00
72384b191d Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
0dfc8ddc0f enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
a38e163035 remove doc from supported processing types (#488)
### What problem does this PR solve?
#474 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:46:09 +08:00
11949f9f2e feat: support markdown files (#483)
parse markdown files as txt

### What problem does this PR solve?

support markdown files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:43:36 +08:00
b8e58fe27a add redis to accelerate access of minio (#482)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:11:09 +08:00
ed6081845a Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?

#384

### Type of change

- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
YC
e8570da856 Update table.py to convert clmns to string (#414)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 19:48:11 +08:00
800b5c7aaa fix bulk error for table method (#407)
### What problem does this PR solve?


Issue link:#366

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 12:17:14 +08:00
d4e0bfc8a5 fix gb2312 encoding issue (#394)
### What problem does this PR solve?

Issue link:#384
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 19:45:14 +08:00
f6c7204002 refine log format (#312)
### What problem does this PR solve?

Issue link:#264
### Type of change


- [x] Documentation Update
- [x] Refactoring
2024-04-11 10:13:43 +08:00
a0a480b708 continue add layout model for 'laws' (#292)
### What problem does this PR solve?

Issue link:#289

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 14:06:36 +08:00
243de6ac90 add a new model for 'Laws' (#290)
### What problem does this PR solve?

Issue link:#289
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 11:59:00 +08:00
bb96180e77 Add more information on vm map count setting (#241)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/236)]

### Type of change

- [x] Documentation Update
2024-04-07 09:41:53 +08:00
23b448cf96 fix docker compose issue (#238)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/226)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-07 09:04:32 +08:00