Feat: parse email (#10181)

### What problem does this PR solve?

- Dataflow support email.
- Fix old email parser.
- Add new depends to parse msg file.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): add new depends.
This commit is contained in:
Lynn
2025-09-22 09:29:38 +08:00
committed by GitHub
parent b5d6a6e8f2
commit 028c2d83e9
7 changed files with 309 additions and 6 deletions

View File

@ -78,7 +78,7 @@ def chunk(
_add_content(msg, msg.get_content_type())
sections = TxtParser.parser_txt("\n".join(text_txt)) + [
(line, "") for line in HtmlParser.parser_txt("\n".join(html_txt)) if line
(line, "") for line in HtmlParser.parser_txt("\n".join(html_txt), chunk_token_num=parser_config["chunk_token_num"]) if line
]
st = timer()