Fix error and format issue (#11975)

### What problem does this PR solve?

1. Fix error of book chunking.
2. Fix format issues.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
This commit is contained in:
Jin Hai
2025-12-16 19:29:37 +08:00
committed by GitHub
parent 344a106eba
commit 0e8b9588ba
9 changed files with 51 additions and 84 deletions

View File

@ -166,9 +166,10 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
sections = [s.split("@") for s, _ in sections]
sections = [(pr[0], "@" + pr[1]) if len(pr) == 2 else (pr[0], '') for pr in sections ]
chunks = naive_merge(
sections, kwargs.get(
"chunk_token_num", 256), kwargs.get(
"delimer", "\n。;!?"))
sections,
parser_config.get("chunk_token_num", 256),
parser_config.get("delimiter", "\n。;!?")
)
# is it English
# is_english(random_choices([t for t, _ in sections], k=218))