Overview
Context summarization automatically compresses older conversation history when token or message limits are reached. It is configured viaLLMContextSummarizationConfig and managed by LLMContextSummarizer.
For a walkthrough of how to enable and customize context summarization, see the Context Summarization guide.
LLMContextSummarizationConfig
Maximum context size in estimated tokens before triggering summarization.
Tokens are estimated using the heuristic of 1 token per 4 characters.
Target token count for the generated summary. Passed to the LLM as
max_tokens. Auto-adjusted to 80% of max_context_tokens if it exceeds that
value.Maximum number of new messages before triggering summarization, even if the
token limit has not been reached.
Number of recent messages to preserve uncompressed after each summarization.
Custom system prompt for the LLM when generating summaries. When
None, uses
a built-in default prompt.Template for formatting the summary when injected into context. Must contain
{summary} as a placeholder. Allows wrapping summaries in custom delimiters
(e.g., XML tags) so system prompts can distinguish summaries from live
conversation.Dedicated LLM service for generating summaries. When set, summarization
requests are sent to this service instead of the pipeline’s primary LLM.
Useful for routing summarization to a cheaper or faster model. When
None,
the pipeline LLM handles summarization.Maximum time in seconds to wait for the LLM to generate a summary. If
exceeded, summarization is aborted and future summarization attempts are
unblocked. Set to
None to disable the timeout.LLMContextSummarizer
LLMAssistantAggregator when enable_context_summarization=True. Access it via assistant_aggregator._summarizer.
Event Handlers
| Event | Parameters | Description |
|---|---|---|
on_summary_applied | event: SummaryAppliedEvent | Emitted after a summary has been successfully applied to the context. |
on_summary_applied
SummaryAppliedEvent
Number of messages in context before summarization.
Number of messages in context after summarization.
Number of messages that were compressed into the summary.
Number of messages preserved uncompressed (system message plus recent
messages).