diff --git a/docs/guides/agent/agent_component_reference/parser.md b/docs/guides/agent/agent_component_reference/parser.md index 531e4d1f1..95e23cbb6 100644 --- a/docs/guides/agent/agent_component_reference/parser.md +++ b/docs/guides/agent/agent_component_reference/parser.md @@ -3,15 +3,15 @@ sidebar_position: 30 slug: /parser_component --- -# Message component +# Parser component A component that sets the parsing rules for your dataset. --- -A **Parser** component sets the parsing rules for various file types, including parsing methods for PDFs , fields to parse for Emails, and OCR methods for images. +A **Parser** component defines how various file types should be parsed, including parsing methods for PDFs , fields to parse for Emails, and OCR methods for images. ## Scenario -An **parser** component is auto-populated on the ingestion pipeline canvase and always required in an ingestion pipeline workflow. \ No newline at end of file +A **Parser** component is auto-populated on the ingestion pipeline canvas and required in all ingestion pipeline workflows. \ No newline at end of file diff --git a/docs/guides/agent/indexer.md b/docs/guides/agent/indexer.md new file mode 100644 index 000000000..f96d9df16 --- /dev/null +++ b/docs/guides/agent/indexer.md @@ -0,0 +1,29 @@ +--- +sidebar_position: 30 +slug: /indexer_component +--- + +# Indexer component + +A component that defines how chunks are indexed. + +--- + +An **Indexer** component indexes chunks and configures their storage formats in the document engine. + +## Scenario + +An **Indexer** component is the mandatory ending component for all ingestion pipelines. + +## Configurations + +### Search method + +This setting configures how chunks are stored in the document engine: as full-text, embeddings, or both. + +### Filename embedding weight + +This setting defines the filename's contribution to the final embedding, which is a weighted combination of both the chunk content and the filename. Essentially, a higher value gives the filename more influence in the final *composite* embedding. + +- 0.1: Filename contributes 10% (chunk content 90%) +- 0.5 (maximum): Filename contributes 50% (chunk content 90%) \ No newline at end of file