Skip to main content

Overview

Context summarization automatically compresses older conversation history when token or message limits are reached. It is configured via LLMContextSummarizationConfig and managed by LLMContextSummarizer. For a walkthrough of how to enable and customize context summarization, see the Context Summarization guide.

LLMContextSummarizationConfig

from pipecat.utils.context.llm_context_summarization import LLMContextSummarizationConfig
Controls when and how context summarization occurs.
max_context_tokens
int
default:"8000"
Maximum context size in estimated tokens before triggering summarization. Tokens are estimated using the heuristic of 1 token per 4 characters.
target_context_tokens
int
default:"6000"
Target token count for the generated summary. Passed to the LLM as max_tokens. Auto-adjusted to 80% of max_context_tokens if it exceeds that value.
max_unsummarized_messages
int
default:"20"
Maximum number of new messages before triggering summarization, even if the token limit has not been reached.
min_messages_after_summary
int
default:"4"
Number of recent messages to preserve uncompressed after each summarization.
summarization_prompt
Optional[str]
default:"None"
Custom system prompt for the LLM when generating summaries. When None, uses a built-in default prompt.
summary_message_template
str
default:"\"Conversation summary: {summary}\""
Template for formatting the summary when injected into context. Must contain {summary} as a placeholder. Allows wrapping summaries in custom delimiters (e.g., XML tags) so system prompts can distinguish summaries from live conversation.
llm
Optional[LLMService]
default:"None"
Dedicated LLM service for generating summaries. When set, summarization requests are sent to this service instead of the pipeline’s primary LLM. Useful for routing summarization to a cheaper or faster model. When None, the pipeline LLM handles summarization.
summarization_timeout
Optional[float]
default:"120.0"
Maximum time in seconds to wait for the LLM to generate a summary. If exceeded, summarization is aborted and future summarization attempts are unblocked. Set to None to disable the timeout.

LLMContextSummarizer

from pipecat.processors.aggregators.llm_context_summarizer import LLMContextSummarizer
Monitors context size and orchestrates summarization. Created automatically by LLMAssistantAggregator when enable_context_summarization=True. Access it via assistant_aggregator._summarizer.

Event Handlers

EventParametersDescription
on_summary_appliedevent: SummaryAppliedEventEmitted after a summary has been successfully applied to the context.

on_summary_applied

@summarizer.event_handler("on_summary_applied")
async def on_summary_applied(summarizer, event: SummaryAppliedEvent):
    logger.info(
        f"Context summarized: {event.original_message_count} -> "
        f"{event.new_message_count} messages"
    )

SummaryAppliedEvent

from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
Event data emitted when context summarization completes successfully.
original_message_count
int
Number of messages in context before summarization.
new_message_count
int
Number of messages in context after summarization.
summarized_message_count
int
Number of messages that were compressed into the summary.
preserved_message_count
int
Number of messages preserved uncompressed (system message plus recent messages).