Chunk text into overlapping segments sized for retrieval-augmented generation (RAG) and map-reduce agent pipelines. The splitter prefers paragraph and sentence boundaries, then hard-splits any segment that still exceeds your maximum character limit.
When to use it
- Prepare long documents for embedding indexes
- Split tool outputs before sequential LLM summarization
- Keep overlap between chunks so sentences at boundaries are not lost
Parameters
- Max characters caps each chunk length
- Overlap repeats trailing characters from the previous chunk at the start of the next for continuity
Limitations
Chunking is character-based, not tokenizer-aware. Very short max lengths may split mid-word. Overlap must be smaller than max characters. Output is a JSON array of strings.
Example
A two-paragraph essay with maxChars 500 and overlap 50 returns multiple strings in a JSON array, each within the size limit.