Docs: Added token chunker and title chunker components (#10711)

### What problem does this PR solve? ### Type of change - [x] Documentation Update
2026-01-23 03:26:53 +08:00 · 2025-10-21 20:11:23 +08:00
parent 1694f32e8e
commit 9a4cd81891
2 changed files with 70 additions and 4 deletions
--- a/docs/guides/agent/agent_component_reference/chunker_title.md
+++ b/docs/guides/agent/agent_component_reference/chunker_title.md
@ -0,0 +1,40 @@
+---
+sidebar_position: 31
+slug: /chunker_title_component
+---
+
+# Title chunker component
+
+A component that splits texts into chunks by heading level.
+
+---
+
+A **Token chunker** component is a text splitter that uses specified heading level as delimiter to define chunk boundaries and create chunks.
+
+## Scenario
+
+A **Title chunker** component is optional, usually placed immediately after **Parser**.
+
+:::caution WARNING
+Placing a **Title chunker** after a **Token chunker** is invalid and will cause an error. Please note that this restriction is not currently system-enforced and requires your attention.
+:::
+
+## Configurations
+
+### Hierarchy
+
+Specifies the heading level to define chunk boundaries: 
+
+- H1
+- H2
+- H3 (Default)
+- H4
+
+Click **+ Add** to add heading levels here or update the corresponding **Regular Expressions** fields for custom heading patterns.
+
+### Output
+
+The global variable name for the output of the **Title chunkder** component, which can be referenced by subsequent components in the ingestion pipeline.
+
+- Default: `chunks`
+- Type: `Array<Object>`
--- a/docs/guides/agent/agent_component_reference/chunker_token.md
+++ b/docs/guides/agent/agent_component_reference/chunker_token.md
@ -3,15 +3,41 @@ sidebar_position: 32
 slug: /chunker_token_component
 ---

-# Parser component
+# Token chunker component

-A component that sets the parsing rules for your dataset.
+A component that splits texts into chunks, respecting a maximum token limit and using delimiters to find optimal breakpoints.

 ---

-A **Parser** component defines how various file types should be parsed, including parsing methods for PDFs , fields to parse for Emails, and OCR methods for images.
+A **Token chunker** component is a text splitter that creates chunks by respecting a recommended maximum token length, using delimiters to ensure logical chunk breakpoints. It splits long texts into appropriately-sized, semantically related chunks.


 ## Scenario

-A **Parser** component is auto-populated on the ingestion pipeline canvas and required in all ingestion pipeline workflows.
+A **Token chunker** component is optional, usually placed immediately after **Parser** or **Title chunker**.
+
+## Configurations
+
+### Recommended chunk size
+
+The recommended maximum token limit for each created chunk. The **Token chunker** component creates chunks at specified delimiters. If this token limit is reached before a delimiter, a chunk is created at that point.
+
+### Overlapped percent (%)
+
+This defines the overlap percentage between chunks. An appropriate degree of overlap ensures semantic coherence without creating excessive, redundant tokens for the LLM.
+
+- Default: 0
+- Maximum: 30%
+
+
+### Delimiters
+
+Defaults to `\n`. Click the right-hand **Recycle bin** button to remove it, or click **+ Add** to add a delimiter.
+
+
+### Output
+
+The global variable name for the output of the **Token chunkder** component, which can be referenced by subsequent components in the ingestion pipeline.
+
+- Default: `chunks`
+- Type: `Array<Object>`