Commit Graph

14 Commits

Author SHA1 Message Date
8dd45459be Add support for HTML file (#973)
### What problem does this PR solve?

Add support for HTML file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
7eee193956 fix #917 #915 (#946)
### What problem does this PR solve?

#917 
#915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
46454362d7 fix raptor bugs (#928)
### What problem does this PR solve?

#922 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 11:01:20 +08:00
9e3a0e4d03 The fasttext library is missing, and it is used in the operators.py file. (#925)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 08:18:47 +08:00
a6e4b74d94 remove unused dependency (#664)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-07 19:46:17 +08:00
de839fc3f0 optimize srv broker and executor logic (#630)
### What problem does this PR solve?

Optimize task broker and executor for reduce memory usage and deployment
complexity.

### Type of change
- [x] Performance Improvement
- [x] Refactoring

### Change Log
- Enhance redis utils for message queue(use stream)
- Modify task broker logic via message queue (1.get parse event from
message queue 2.use ThreadPoolExecutor async executor )
- Modify the table column name of document and task (process_duation ->
process_duration maybe just a spelling mistake)
- Reformat some code style(just what i see)
- Add requirement_dev.txt for developer
- Add redis container on docker compose

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-05-07 11:43:33 +08:00
c89f3c3cdb Fix missing 'ollama' package in requirements.txt (#621)
### What problem does this PR solve?

This commit resolves an issue where the 'ollama' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-30 16:29:46 +08:00
5d7f573379 Fix: missing 'redis' package in requirements.txt (#622)
### What problem does this PR solve?

This commit resolves an issue where the 'redis' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-30 16:29:27 +08:00
cab274f560 remove PyMuPDF (#618)
### What problem does this PR solve?
#613 

### Type of change


- [x] Other (please describe):
2024-04-30 12:38:09 +08:00
72384b191d Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
890561703b Add bce-embedding and fastembed (#383)
### What problem does this PR solve?


Issue link:#326

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 16:42:19 +08:00
a7be5d4e8b build ragflow image from scratch (#376)
### What problem does this PR solve?

issue: #205 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 12:29:58 +08:00
826ad6a33a feat: FastEmbed embedding support (#291)
### Description

Following up on https://github.com/infiniflow/ragflow/pull/275, this PR
adds support for FastEmbed model configurations.

The options are not exhaustive. You can find the full list
[here](https://qdrant.github.io/fastembed/examples/Supported_Models/).

P.S. I ran into OOM issues when building the image.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-04-15 15:58:06 +08:00
71fe314955 refine page ranges (#147) 2024-03-25 13:11:57 +08:00