mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-01-01 01:25:32 +08:00
fix(rag/prompts): Restructure metadata extraction rules for precision (#12360)
### What problem does this PR solve? - Simplified and consolidated extraction rules - Emphasized strict evidence-based extraction only - Strengthened enum handling and hallucination prevention - Clarified output requirements for empty results ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
This commit is contained in:
@ -1,14 +1,11 @@
|
||||
## Role: Metadata extraction expert
|
||||
## Constraints:
|
||||
- Core Directive: Extract important structured information from the given content. Output ONLY a valid JSON string. No Markdown (e.g., ```json), no explanations, and no notes.
|
||||
- Schema Parsing: In the `properties` object provided in Schema, the attribute name (e.g., 'author') is the target Key. Extract values based on the `description`; if no `description` is provided, refer to the key's literal meaning.
|
||||
- Extraction Rules: Extract only when there is an explicit semantic correlation. If multiple values or data points match a field's definition, extract and include all of them. Strictly follow the Schema below and only output matched key-value pairs. If the content is irrelevant or no matching information is identified, you **MUST** output {}.
|
||||
- Data Source: Extraction must be based solely on content below. Semantic mapping (synonyms) is allowed, but strictly prohibit hallucinations or fabricated facts.
|
||||
|
||||
## Enum Rules (Triggered ONLY if an enum list is present):
|
||||
- Value Lock: All extracted values MUST strictly match the provided enum list.
|
||||
- Normalization: Map synonyms or variants in the text back to the standard enum value (e.g., "Dec" to "December").
|
||||
- Fallback: Output {} if no explicit match or synonym is identified.
|
||||
## Role: Metadata extraction expert.
|
||||
## Rules:
|
||||
- Strict Evidence Only: Extract a value ONLY if it is explicitly mentioned in the Content.
|
||||
- Enum Filter: For any field with an 'enum' list, the list acts as a strict filter. If no element from the list (or its direct synonym) is found in the Content, you MUST NOT extract that field.
|
||||
- No Meta-Inference: Do not infer values based on the document's nature, format, or category. If the text does not literally state the information, treat it as missing.
|
||||
- Zero-Hallucination: Never invent information or pick a "likely" value from the enum to fill a field.
|
||||
- Empty Result: If no matches are found for any field, or if the content is irrelevant, output ONLY {}.
|
||||
- Output: ONLY a valid JSON string. No Markdown, no notes.
|
||||
|
||||
## Schema for extraction:
|
||||
{{ schema }}
|
||||
|
||||
Reference in New Issue
Block a user