What is Context in Pipecat?
In Pipecat, context refers to the conversation history that the LLM uses to generate responses. The context consists of a list of alternating user/assistant messages that represents the collective history of the entire conversation.How Context Updates During Conversations
Context updates happen automatically as frames flow through your pipeline: User Messages:- User speaks →
InputAudioRawFrame→ STT Service →TranscriptionFrame context_aggregator.user()receivesTranscriptionFrameand adds user message to context
- LLM generates response →
LLMTextFrame→ TTS Service →TTSTextFrame context_aggregator.assistant()receivesTTSTextFrameand adds assistant message to context
TranscriptionFrame: Contains user speech converted to text by STT serviceLLMTextFrame: Contains LLM-generated responsesTTSTextFrame: Contains bot responses converted to text by TTS service (represents what was actually spoken)
The TTS service processes
LLMTextFrames but outputs TTSTextFrames, which
represent the actual spoken text returned by the TTS provider. This ensures
context matches what users actually hear.Setting Up Context Management
Pipecat includes a context aggregator that creates and manages context for both user and assistant messages:1. Create the Context and Context Aggregator
2. Context with Function Calling
Context can also include tools (function definitions) that the LLM can call during conversations:We’ll cover function calling in detail in an upcoming section. The context
aggregator handles function call storage automatically.
3. Add Context Aggregators to Your Pipeline
Context Aggregator Placement
The placement of context aggregator instances in your pipeline is crucial for proper operation:User Context Aggregator
Place the user context aggregator downstream from the STT service. Since the user’s speech results inTranscriptionFrame objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames.
Assistant Context Aggregator
Place the assistant context aggregator aftertransport.output(). This positioning is important because:
- The TTS service outputs
TTSTextFrames in addition to audio - The assistant aggregator must be downstream to collect those frames
- It ensures context updates happen word-by-word for specific services (e.g. Cartesia, ElevenLabs, and Rime)
- Your context stays updated at the word level in case an interruption occurs
Always place the assistant context aggregator after
transport.output()
to ensure proper word-level context updates during interruptions.Manual Context Control
You can programmatically add new messages to the context by pushing or queueing specific frames:Adding Messages
LLMMessagesAppendFrame: Appends a new message to the existing contextLLMMessagesUpdateFrame: Completely replaces the existing context with new messages
Retrieving Current Context
The context aggregator provides acontext property for getting the current context:
Triggering Bot Responses
You may want to manually trigger the bot to speak in two scenarios:- Starting a pipeline where the bot should speak first
- After editing the context using
LLMMessagesAppendFrameorLLMMessagesUpdateFrame
Key Takeaways
- Context is conversation history - automatically maintained as users and bots exchange messages
- Frame types matter -
TranscriptionFramefor users,TTSTextFramefor assistants - Placement matters - user aggregator after STT, assistant aggregator after transport output
- Tools are included - function definitions and results are stored in context
- Manual control available - use frames to append messages or trigger responses when needed
- Word-level precision - proper placement ensures context accuracy during interruptions
What’s Next
Now that you understand context management, let’s explore how to configure the LLM services that process this context to generate intelligent responses.LLM Inference
Learn how to configure language models in your voice AI pipeline