Multi-Context Websocket

This guide shows you how to build real-time voice agents using the multi-context WebSocket API.

Advanced

Orchestrating voice agents using this multi-context WebSocket API is a complex task recommended for advanced developers. For a more managed solution, consider exploring our Agents Platform product, which simplifies many of these challenges.

Multi-context WebSockets are not available for the eleven_v3 model.

Overview

Building responsive voice agents requires the ability to manage audio streams dynamically, handle interruptions gracefully, and maintain natural-sounding speech across conversational turns. Our multi-context WebSocket API for Text to Speech (TTS) is specifically designed for these scenarios.

This API extends our standard TTS WebSocket functionality by introducing the concept of โ€œcontexts.โ€ Each context operates as an independent audio generation stream within a single WebSocket connection. This allows you to:

  • Manage multiple lines of speech concurrently (e.g., agent speaking while preparing a response to a user interruption).
  • Seamlessly handle user barge-ins by closing an existing speech context and initiating a new one.
  • Maintain prosodic consistency for utterances within the same logical context.
  • Optimize resource usage by selectively closing contexts that are no longer needed.

The multi-context WebSocket API is optimized for voice applications and is not intended for generating multiple unrelated audio streams simultaneously. Each connection is limited to 5 concurrent contexts to reflect this.

This guide will walk you through connecting to the multi-context WebSocket, managing contexts, and applying best practices for building engaging voice agents.

Best practices

These best practices are essential for building responsive, efficient voice agents with our multi-context WebSocket API.

1

Use a single WebSocket connection

Establish one WebSocket connection for each end-user session. This reduces overhead and latency compared to creating multiple connections. Within this single connection, you can manage multiple contexts for different parts of the conversation.

2

Stream responses in chunks, generate sentences

When generating long responses, stream the text in smaller chunks and use the flush: true flag at the end of complete sentences. This improves the quality of the generated audio and improves responsiveness.

3

Handle interruptions gracefully

Stream text into one context until an interruption occurs, then create a new context and close the existing one. This approach ensures smooth transitions when the conversation flow changes.

4

Manage context lifecycle

Close unused contexts promptly. The server can maintain up to 5 concurrent contexts per connection, but you should close contexts when they are no longer needed.

5

Prevent context timeouts

Contexts by default timeout after 20 seconds and are closed automatically. The inactivity timeout is a websocket level parameter that applies to all contexts and can be up to 180 seconds if needed. Send an empty text message on a context to reset the timeout clock.

Handling interuptions

When a user interrupts your agent, you should close the current context and create a new one:

1async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
2 # Close the existing context that was interrupted
3 await websocket.send(json.dumps({
4 "context_id": old_context_id,
5 "close_context": True
6 }))
7 print(f"Closed interrupted context '{old_context_id}'")
8
9 # Create a new context for the new response
10 await send_text_in_context(websocket, new_response, new_context_id)

Keeping a context alive

Contexts automatically timeout after a default of 20 seconds of inactivity. If you need to keep a context alive without generating text (for example, during a processing delay), you can send an empty text message to reset the timeout clock.

1async def keep_context_alive(websocket, context_id):
2 await websocket.send(json.dumps({
3 "context_id": context_id,
4 "text": ""
5 }))

Closing the WebSocket connection

When your conversation ends, you can clean up all contexts by closing the socket:

1async def end_conversation(websocket):
2 # This will close all contexts and close the connection
3 await websocket.send(json.dumps({
4 "close_socket": True
5 }))
6 print("Ending conversation and closing WebSocket")`

Complete conversational agent example

Requirements

  • An ElevenLabs account with an API key (learn how to find your API key).
  • Python or Node.js (or another JavaScript runtime) installed on your machine.
  • Familiarity with WebSocket communication. We recommend reading our guide on standard WebSocket streaming for foundational concepts.

Setup

Install the necessary dependencies for your chosen language:

1pip install python-dotenv websockets

Create a .env file in your project directory to store your API key:

.env
1ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Example voice agent

This code is provided as an example and is not intended for production usage
1import os
2import json
3import asyncio
4import websockets
5from dotenv import load_dotenv
6
7load_dotenv()
8ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
9VOICE_ID = "your_voice_id"
10MODEL_ID = "eleven_flash_v2_5"
11
12WEBSOCKET_URI = f"wss://api.elevenlabscreator.arsenaldigitalweb.com.br/v1/text-to-speech/{VOICE_ID}/multi-stream-input?model_id={MODEL_ID}"
13
14async def send_text_in_context(websocket, text, context_id, voice_settings=None):
15 """Send text to be synthesized in the specified context."""
16 message = {
17 "text": text,
18 "context_id": context_id,
19 }
20
21 # Only include voice_settings for the first message in a context
22 if voice_settings:
23 message["voice_settings"] = voice_settings
24
25 await websocket.send(json.dumps(message))
26
27async def continue_context(websocket, text, context_id):
28 """Add more text to an existing context."""
29 await websocket.send(json.dumps({
30 "text": text,
31 "context_id": context_id
32 }))
33
34async def flush_context(websocket, context_id):
35 """Force generation of any buffered audio in the context."""
36 await websocket.send(json.dumps({
37 "context_id": context_id,
38 "flush": True
39 }))
40
41async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
42 """Handle user interruption by closing current context and starting a new one."""
43 # Close the existing context that was interrupted
44 await websocket.send(json.dumps({
45 "context_id": old_context_id,
46 "close_context": True
47 }))
48
49 # Create a new context for the new response
50 await send_text_in_context(websocket, new_response, new_context_id)
51
52async def end_conversation(websocket):
53 """End the conversation and close the WebSocket connection."""
54 await websocket.send(json.dumps({
55 "close_socket": True
56 }))
57
58async def receive_messages(websocket):
59 """Process incoming WebSocket messages."""
60 context_audio = {}
61 try:
62 async for message in websocket:
63 data = json.loads(message)
64 context_id = data.get("contextId", "default")
65
66 if data.get("audio"):
67 print(f"Received audio for context '{context_id}'")
68
69 if data.get("is_final"):
70 print(f"Context '{context_id}' completed")
71 except (websockets.exceptions.ConnectionClosed, asyncio.CancelledError):
72 print("Message receiving stopped")
73
74async def conversation_agent_demo():
75 """Run a complete conversational agent demo."""
76 # Connect with API key in headers
77 async with websockets.connect(
78 WEBSOCKET_URI,
79 max_size=16 * 1024 * 1024,
80 additional_headers={"xi-api-key": ELEVENLABS_API_KEY}
81 ) as websocket:
82 # Start receiving messages in background
83 receive_task = asyncio.create_task(receive_messages(websocket))
84
85 # Initial agent response
86 await send_text_in_context(
87 websocket,
88 "Hello! I'm your virtual assistant. I can help you with a wide range of topics. What would you like to know about today?",
89 "greeting"
90 )
91
92 # Wait a bit (simulating user listening)
93 await asyncio.sleep(2)
94
95 # Simulate user interruption
96 print("USER INTERRUPTS: 'Can you tell me about the weather?'")
97
98 # Handle the interruption by closing current context and starting new one
99 await handle_interruption(
100 websocket,
101 "greeting",
102 "weather_response",
103 "I'd be happy to tell you about the weather. Currently in your area, it's 72 degrees and sunny with a slight chance of rain later this afternoon."
104 )
105
106 # Add more to the weather context
107 await continue_context(
108 websocket,
109 " If you're planning to go outside, you might want to bring a light jacket just in case.",
110 "weather_response"
111 )
112
113 # Flush at the end of this turn to ensure all audio is generated
114 await flush_context(websocket, "weather_response")
115
116 # Wait a bit (simulating user listening)
117 await asyncio.sleep(3)
118
119 # Simulate user asking another question
120 print("USER: 'What about tomorrow?'")
121
122 # Create a new context for this response
123 await send_text_in_context(
124 websocket,
125 "Tomorrow's forecast shows temperatures around 75 degrees with partly cloudy skies. It should be a beautiful day overall!",
126 "tomorrow_weather"
127 )
128
129 # Flush and close this context
130 await flush_context(websocket, "tomorrow_weather")
131 await websocket.send(json.dumps({
132 "context_id": "tomorrow_weather",
133 "close_context": True
134 }))
135
136 # End the conversation
137 await asyncio.sleep(2)
138 await end_conversation(websocket)
139
140 # Cancel the receive task
141 receive_task.cancel()
142 try:
143 await receive_task
144 except asyncio.CancelledError:
145 pass
146
147if __name__ == "__main__":
148 asyncio.run(conversation_agent_demo())

Next steps

๐Ÿ” Ferramentas de Espionagem
Servidor: srv1638767 ยท BR-SP