Commit Graph

104 Commits

Author SHA1 Message Date
0dfc8ddc0f enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
a38e163035 remove doc from supported processing types (#488)
### What problem does this PR solve?
#474 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:46:09 +08:00
11949f9f2e feat: support markdown files (#483)
parse markdown files as txt

### What problem does this PR solve?

support markdown files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:43:36 +08:00
b8e58fe27a add redis to accelerate access of minio (#482)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:11:09 +08:00
ed6081845a Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?

#384

### Type of change

- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
YC
e8570da856 Update table.py to convert clmns to string (#414)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 19:48:11 +08:00
800b5c7aaa fix bulk error for table method (#407)
### What problem does this PR solve?


Issue link:#366

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 12:17:14 +08:00
d4e0bfc8a5 fix gb2312 encoding issue (#394)
### What problem does this PR solve?

Issue link:#384
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 19:45:14 +08:00
f6c7204002 refine log format (#312)
### What problem does this PR solve?

Issue link:#264
### Type of change


- [x] Documentation Update
- [x] Refactoring
2024-04-11 10:13:43 +08:00
a0a480b708 continue add layout model for 'laws' (#292)
### What problem does this PR solve?

Issue link:#289

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 14:06:36 +08:00
243de6ac90 add a new model for 'Laws' (#290)
### What problem does this PR solve?

Issue link:#289
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-10 11:59:00 +08:00
bb96180e77 Add more information on vm map count setting (#241)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/236)]

### Type of change

- [x] Documentation Update
2024-04-07 09:41:53 +08:00
23b448cf96 fix docker compose issue (#238)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/226)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-07 09:04:32 +08:00
392e515c3f fix bug about reload knowledgebase configuration reloading (#210)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/209)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-03 11:00:50 +08:00
36f2d7b797 To avoid assertion while no rows in excel (#197)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/196)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing
functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):
2024-04-02 10:51:21 +08:00
f3477202fe refine citation (#161) 2024-03-28 11:45:50 +08:00
bf2e3d7fc1 refine OpenAi Api (#159) 2024-03-27 17:55:45 +08:00
37185466e2 READEME refined (#156) 2024-03-27 13:14:36 +08:00
fd7fcb5baf apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
da21320b88 fix plainPdf bugs (#152) 2024-03-26 15:11:07 +08:00
71fe314955 refine page ranges (#147) 2024-03-25 13:11:57 +08:00
f6aee7f230 add use layout or not option (#145)
* add use layout or not option

* trival
2024-03-22 19:21:09 +08:00
6c6b144de2 refine manual parser (#140) 2024-03-21 18:17:32 +08:00
5875c8ba08 Add 'One' chunk method (#137) 2024-03-20 18:57:22 +08:00
6999598101 refine for English corpus (#135) 2024-03-20 16:56:16 +08:00
9a843667b3 fix github account login issue (#132) 2024-03-19 15:31:47 +08:00
9da671b951 refine manul parser (#131) 2024-03-19 12:26:04 +08:00
de09b0e1a4 resolve table issues (#125) 2024-03-15 14:59:28 +08:00
675a9f8d9a add dockerfile for cuda envirement. Refine table search strategy, (#123) 2024-03-14 19:45:29 +08:00
0feb085c88 refine table parser (#120) 2024-03-12 18:56:04 +08:00
f1f09df901 add local llm implementation (#119) 2024-03-12 11:57:08 +08:00
bcb58b7e71 layout refine (#115) 2024-03-08 18:59:53 +08:00
8f86ab9f7f refine pdf parser, add time zone to userinfo (#112) 2024-03-08 11:24:24 +08:00
436c52bbc5 refine presentation parser (#110) 2024-03-07 17:21:38 +08:00
2d7c9080f4 deal with stop reason being length problem (#109) 2024-03-07 16:12:01 +08:00
d7c362f237 adjust hierarchical_merge strategy (#100) 2024-03-06 09:09:16 +08:00
602038ac49 fix task cancling bug (#98) 2024-03-05 16:33:47 +08:00
8a57f2afd5 change callback strategy, add timezone to docker (#96) 2024-03-05 12:08:41 +08:00
7bfaf0df29 fix position extraction bug (#93)
* fix position extraction bug

* remove delimiter for naive parser
2024-03-04 17:08:35 +08:00
685b4d8a95 fix table desc bugs, add positions to chunks (#91) 2024-03-04 14:42:26 +08:00
8a726fb04b solve task execution issues (#90) 2024-03-01 19:48:01 +08:00
3d4315c42a resolve the issue of naive parser (#87) 2024-02-29 18:53:02 +08:00
0429107e80 fix user login issue (#85) 2024-02-29 14:03:07 +08:00
7fd1eca582 init README of deepdoc, add picture processer. (#71)
* init README of deepdoc, add picture processer.

* add resume parsing
2024-02-23 18:28:12 +08:00
cacd36c5e1 use onnx models, new deepdoc (#68) 2024-02-21 16:32:38 +08:00
a8294f2168 Refine resume parts and fix bugs in retrival using sql (#66) 2024-02-19 19:22:17 +08:00
5e0a689c43 refactor retieval_test, add SQl retrieval methods (#61) 2024-02-08 17:01:01 +08:00
c5ea37cd30 Add resume parser and fix bugs (#59)
* Update .gitignore

* Update .gitignore

* Add resume parser and fix bugs
2024-02-07 19:27:23 +08:00
407b2523b6 remove unused codes, seperate layout detection out as a new api. Add new rag methed 'table' (#55) 2024-02-05 18:08:17 +08:00
51482f3e2a Some document API refined. (#53)
Add naive chunking method to RAG
2024-02-02 19:21:37 +08:00