Multi-Context Websocket | ElevenLabs Documentation

Advanced

Orchestrating voice agents using this multi-context WebSocket API is a complex task recommended for advanced developers. For a more managed solution, consider exploring our Agents Platform product, which simplifies many of these challenges.

Multi-context WebSockets are not available for the eleven_v3 model.

Overview

Building responsive voice agents requires the ability to manage audio streams dynamically, handle interruptions gracefully, and maintain natural-sounding speech across conversational turns. Our multi-context WebSocket API for Text to Speech (TTS) is specifically designed for these scenarios.

This API extends our standard TTS WebSocket functionality by introducing the concept of “contexts.” Each context operates as an independent audio generation stream within a single WebSocket connection. This allows you to:

Manage multiple lines of speech concurrently (e.g., agent speaking while preparing a response to a user interruption).
Seamlessly handle user barge-ins by closing an existing speech context and initiating a new one.
Maintain prosodic consistency for utterances within the same logical context.
Optimize resource usage by selectively closing contexts that are no longer needed.

The multi-context WebSocket API is optimized for voice applications and is not intended for generating multiple unrelated audio streams simultaneously. Each connection is limited to 5 concurrent contexts to reflect this.

This guide will walk you through connecting to the multi-context WebSocket, managing contexts, and applying best practices for building engaging voice agents.

Best practices

These best practices are essential for building responsive, efficient voice agents with our multi-context WebSocket API.

Use a single WebSocket connection

Establish one WebSocket connection for each end-user session. This reduces overhead and latency compared to creating multiple connections. Within this single connection, you can manage multiple contexts for different parts of the conversation.

Stream responses in chunks, generate sentences

When generating long responses, stream the text in smaller chunks and use the flush: true flag at the end of complete sentences. This improves the quality of the generated audio and improves responsiveness.

Handle interruptions gracefully

Stream text into one context until an interruption occurs, then create a new context and close the existing one. This approach ensures smooth transitions when the conversation flow changes.

Manage context lifecycle

Close unused contexts promptly. The server can maintain up to 5 concurrent contexts per connection, but you should close contexts when they are no longer needed.

Prevent context timeouts

Contexts by default timeout after 20 seconds and are closed automatically. The inactivity timeout is a websocket level parameter that applies to all contexts and can be up to 180 seconds if needed. Send an empty text message on a context to reset the timeout clock.

Handling interuptions

When a user interrupts your agent, you should close the current context and create a new one:

1 async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
2     # Close the existing context that was interrupted
3     await websocket.send(json.dumps({
4         "context_id": old_context_id,
5         "close_context": True
6     }))
7     print(f"Closed interrupted context '{old_context_id}'")
8 
9     # Create a new context for the new response
10     await send_text_in_context(websocket, new_response, new_context_id)

Keeping a context alive

Contexts automatically timeout after a default of 20 seconds of inactivity. If you need to keep a context alive without generating text (for example, during a processing delay), you can send an empty text message to reset the timeout clock.

1 async def keep_context_alive(websocket, context_id):
2     await websocket.send(json.dumps({
3         "context_id": context_id,
4         "text": ""
5     }))

Closing the WebSocket connection

When your conversation ends, you can clean up all contexts by closing the socket:

1 async def end_conversation(websocket):
2     # This will close all contexts and close the connection
3     await websocket.send(json.dumps({
4         "close_socket": True
5     }))
6     print("Ending conversation and closing WebSocket")`

Complete conversational agent example

Requirements

An ElevenLabs account with an API key (learn how to find your API key).
Python or Node.js (or another JavaScript runtime) installed on your machine.
Familiarity with WebSocket communication. We recommend reading our guide on standard WebSocket streaming for foundational concepts.

Setup

Install the necessary dependencies for your chosen language:

1 pip install python-dotenv websockets

Create a .env file in your project directory to store your API key:

.env

1 ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Example voice agent

This code is provided as an example and is not intended for production usage

1 import os
2 import json
3 import asyncio
4 import websockets
5 from dotenv import load_dotenv
6 
7 load_dotenv()
8 ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
9 VOICE_ID = "your_voice_id"
10 MODEL_ID = "eleven_flash_v2_5"
11 
12 WEBSOCKET_URI = f"wss://api.elevenlabscreator.arsenaldigitalweb.com.br/v1/text-to-speech/{VOICE_ID}/multi-stream-input?model_id={MODEL_ID}"
13 
14 async def send_text_in_context(websocket, text, context_id, voice_settings=None):
15     """Send text to be synthesized in the specified context."""
16     message = {
17         "text": text,
18         "context_id": context_id,
19     }
20 
21     # Only include voice_settings for the first message in a context
22     if voice_settings:
23         message["voice_settings"] = voice_settings
24 
25     await websocket.send(json.dumps(message))
26 
27 async def continue_context(websocket, text, context_id):
28     """Add more text to an existing context."""
29     await websocket.send(json.dumps({
30         "text": text,
31         "context_id": context_id
32     }))
33 
34 async def flush_context(websocket, context_id):
35     """Force generation of any buffered audio in the context."""
36     await websocket.send(json.dumps({
37         "context_id": context_id,
38         "flush": True
39     }))
40 
41 async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
42     """Handle user interruption by closing current context and starting a new one."""
43     # Close the existing context that was interrupted
44     await websocket.send(json.dumps({
45         "context_id": old_context_id,
46         "close_context": True
47     }))
48 
49     # Create a new context for the new response
50     await send_text_in_context(websocket, new_response, new_context_id)
51 
52 async def end_conversation(websocket):
53     """End the conversation and close the WebSocket connection."""
54     await websocket.send(json.dumps({
55         "close_socket": True
56     }))
57 
58 async def receive_messages(websocket):
59     """Process incoming WebSocket messages."""
60     context_audio = {}
61     try:
62         async for message in websocket:
63             data = json.loads(message)
64             context_id = data.get("contextId", "default")
65 
66             if data.get("audio"):
67                 print(f"Received audio for context '{context_id}'")
68 
69             if data.get("is_final"):
70                 print(f"Context '{context_id}' completed")
71     except (websockets.exceptions.ConnectionClosed, asyncio.CancelledError):
72         print("Message receiving stopped")
73 
74 async def conversation_agent_demo():
75     """Run a complete conversational agent demo."""
76     # Connect with API key in headers
77     async with websockets.connect(
78         WEBSOCKET_URI,
79         max_size=16 * 1024 * 1024,
80         additional_headers={"xi-api-key": ELEVENLABS_API_KEY}
81     ) as websocket:
82         # Start receiving messages in background
83         receive_task = asyncio.create_task(receive_messages(websocket))
84 
85         # Initial agent response
86         await send_text_in_context(
87             websocket,
88             "Hello! I'm your virtual assistant. I can help you with a wide range of topics. What would you like to know about today?",
89             "greeting"
90         )
91 
92         # Wait a bit (simulating user listening)
93         await asyncio.sleep(2)
94 
95         # Simulate user interruption
96         print("USER INTERRUPTS: 'Can you tell me about the weather?'")
97 
98         # Handle the interruption by closing current context and starting new one
99         await handle_interruption(
100             websocket,
101             "greeting",
102             "weather_response",
103             "I'd be happy to tell you about the weather. Currently in your area, it's 72 degrees and sunny with a slight chance of rain later this afternoon."
104         )
105 
106         # Add more to the weather context
107         await continue_context(
108             websocket,
109             " If you're planning to go outside, you might want to bring a light jacket just in case.",
110             "weather_response"
111         )
112 
113         # Flush at the end of this turn to ensure all audio is generated
114         await flush_context(websocket, "weather_response")
115 
116         # Wait a bit (simulating user listening)
117         await asyncio.sleep(3)
118 
119         # Simulate user asking another question
120         print("USER: 'What about tomorrow?'")
121 
122         # Create a new context for this response
123         await send_text_in_context(
124             websocket,
125             "Tomorrow's forecast shows temperatures around 75 degrees with partly cloudy skies. It should be a beautiful day overall!",
126             "tomorrow_weather"
127         )
128 
129         # Flush and close this context
130         await flush_context(websocket, "tomorrow_weather")
131         await websocket.send(json.dumps({
132             "context_id": "tomorrow_weather",
133             "close_context": True
134         }))
135 
136         # End the conversation
137         await asyncio.sleep(2)
138         await end_conversation(websocket)
139 
140         # Cancel the receive task
141         receive_task.cancel()
142         try:
143             await receive_task
144         except asyncio.CancelledError:
145             pass
146 
147 if __name__ == "__main__":
148     asyncio.run(conversation_agent_demo())

Next steps

ElevenAgents

Build production-ready voice agents with the full ElevenAgents platform.

Understanding audio streaming

Learn how WebSocket streaming works under the hood and what affects latency.