mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-08 20:42:30 +08:00
Feat(nlp): add "怎么办" pattern to question word removal (#10284)
### What problem does this PR solve? Added "怎么办" to the regex pattern in rmWWW method to improve query cleaning by removing this common question phrase along with other question words. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
@ -56,7 +56,7 @@ class FulltextQueryer:
|
|||||||
def rmWWW(txt):
|
def rmWWW(txt):
|
||||||
patts = [
|
patts = [
|
||||||
(
|
(
|
||||||
r"是*(什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀|谁|哪位|哪个)是*",
|
r"是*(怎么办|什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀|谁|哪位|哪个)是*",
|
||||||
"",
|
"",
|
||||||
),
|
),
|
||||||
(r"(^| )(what|who|how|which|where|why)('re|'s)? ", " "),
|
(r"(^| )(what|who|how|which|where|why)('re|'s)? ", " "),
|
||||||
|
|||||||
Reference in New Issue
Block a user