mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-08 20:42:30 +08:00
Feat(nlp): add "怎么办" pattern to question word removal (#10284)
### What problem does this PR solve? Added "怎么办" to the regex pattern in rmWWW method to improve query cleaning by removing this common question phrase along with other question words. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
@ -56,7 +56,7 @@ class FulltextQueryer:
|
||||
def rmWWW(txt):
|
||||
patts = [
|
||||
(
|
||||
r"是*(什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀|谁|哪位|哪个)是*",
|
||||
r"是*(怎么办|什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀|谁|哪位|哪个)是*",
|
||||
"",
|
||||
),
|
||||
(r"(^| )(what|who|how|which|where|why)('re|'s)? ", " "),
|
||||
|
||||
Reference in New Issue
Block a user