Context Summarization

Overview

Context summarization automatically compresses older conversation history when token or message limits are reached. It is configured via LLMContextSummarizationConfig and managed by LLMContextSummarizer. For a walkthrough of how to enable and customize context summarization, see the Context Summarization guide.

LLMContextSummarizationConfig

from pipecat.utils.context.llm_context_summarization import LLMContextSummarizationConfig

Controls when and how context summarization occurs.

max_context_tokens

int

default:"8000"

Maximum context size in estimated tokens before triggering summarization. Tokens are estimated using the heuristic of 1 token per 4 characters.

target_context_tokens

int

default:"6000"

Target token count for the generated summary. Passed to the LLM as max_tokens. Auto-adjusted to 80% of max_context_tokens if it exceeds that value.

max_unsummarized_messages

int

default:"20"

Maximum number of new messages before triggering summarization, even if the token limit has not been reached.

min_messages_after_summary

int

default:"4"

Number of recent messages to preserve uncompressed after each summarization.

summarization_prompt

Optional[str]

default:"None"

Custom system prompt for the LLM when generating summaries. When None, uses a built-in default prompt.

summary_message_template

str

default:"\"Conversation summary: {summary}\""

Template for formatting the summary when injected into context. Must contain {summary} as a placeholder. Allows wrapping summaries in custom delimiters (e.g., XML tags) so system prompts can distinguish summaries from live conversation.

llm

Optional[LLMService]

default:"None"

Dedicated LLM service for generating summaries. When set, summarization requests are sent to this service instead of the pipeline’s primary LLM. Useful for routing summarization to a cheaper or faster model. When None, the pipeline LLM handles summarization.

summarization_timeout

Optional[float]

default:"120.0"

Maximum time in seconds to wait for the LLM to generate a summary. If exceeded, summarization is aborted and future summarization attempts are unblocked. Set to None to disable the timeout.

LLMContextSummarizer

from pipecat.processors.aggregators.llm_context_summarizer import LLMContextSummarizer

Monitors context size and orchestrates summarization. Created automatically by LLMAssistantAggregator when enable_context_summarization=True. Access it via assistant_aggregator._summarizer.

Event Handlers

Event	Parameters	Description
`on_summary_applied`	`event: SummaryAppliedEvent`	Emitted after a summary has been successfully applied to the context.

on_summary_applied

@summarizer.event_handler("on_summary_applied")
async def on_summary_applied(summarizer, event: SummaryAppliedEvent):
    logger.info(
        f"Context summarized: {event.original_message_count} -> "
        f"{event.new_message_count} messages"
    )

SummaryAppliedEvent

from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent

Event data emitted when context summarization completes successfully.

original_message_count

int

Number of messages in context before summarization.

new_message_count

int

Number of messages in context after summarization.

summarized_message_count

int

Number of messages that were compressed into the summary.

preserved_message_count

int

Number of messages preserved uncompressed (system message plus recent messages).

API Reference

Services

Utilities

Frameworks

Pipeline

Context Summarization

Overview

LLMContextSummarizationConfig

LLMContextSummarizer

Event Handlers

on_summary_applied

SummaryAppliedEvent

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

​LLMContextSummarizationConfig

​LLMContextSummarizer

​Event Handlers

​on_summary_applied

​SummaryAppliedEvent

Overview

LLMContextSummarizationConfig

LLMContextSummarizer

Event Handlers

on_summary_applied

SummaryAppliedEvent