Add 'One' chunk method (#137)

This commit is contained in:
KevinHuSh
2024-03-20 18:57:22 +08:00
committed by GitHub
parent fce14ee187
commit 5875c8ba08
11 changed files with 143 additions and 24 deletions

View File

@ -49,7 +49,7 @@ class Pdf(PdfParser):
def chunk(filename, binary=None, from_page=0, to_page=100000, lang="Chinese", callback=None, **kwargs):
"""
Supported file formats are docx, pdf, txt.
Supported file formats are docx, pdf, excel, txt.
This method apply the naive ways to chunk files.
Successive text will be sliced into pieces using 'delimiter'.
Next, these successive pieces are merge into chunks whose token number is no more than 'Max token number'.