Add docx support for manual parser (#1227)

### What problem does this PR solve?

Add docx support for manual parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
Zhedong Cen
2024-06-20 17:03:02 +08:00
committed by GitHub
parent fb56a29478
commit 3c1444ab19
3 changed files with 189 additions and 84 deletions

View File

@ -497,3 +497,9 @@ def naive_merge(sections, chunk_token_num=128, delimiter="\n。"):
add_chunk(sec[s: e], pos)
return cks
def docx_question_level(p):
if p.style.name.startswith('Heading'):
return int(p.style.name.split(' ')[-1]), re.sub(r"\u3000", " ", p.text).strip()
else:
return 0, re.sub(r"\u3000", " ", p.text).strip()