Realtime agents in the Python SDK are server-side, low-latency agents built on the OpenAI Realtime API over WebSocket transport.
!!! warning "Beta feature"
Realtime agents are in beta. Expect some breaking changes as we improve the implementation.
!!! note "Python SDK boundary"
The Python SDK does **not** provide a browser WebRTC transport. This page only covers Python-managed realtime sessions over server-side WebSockets. Use this SDK for server-side orchestration, tools, approvals, and telephony integrations. See also [Realtime transport](transport.md).
- Python 3.10 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
If you haven't already, install the OpenAI Agents SDK:
pip install openai-agentsimport asyncio
from agents.realtime import RealtimeAgent, RealtimeRunneragent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses short and conversational.",
)Prefer the nested audio.input / audio.output session settings shape for new code. For new realtime agents, start with gpt-realtime-1.5.
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime-1.5",
"audio": {
"input": {
"format": "pcm16",
"transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {
"type": "semantic_vad",
"interrupt_response": True,
},
},
"output": {
"format": "pcm16",
"voice": "ash",
},
},
}
},
)runner.run() returns a RealtimeSession. The connection is opened when you enter the session context.
async def main() -> None:
session = await runner.run()
async with session:
await session.send_message("Say hello in one short sentence.")
async for event in session:
if event.type == "audio":
# Forward or play event.audio.data.
pass
elif event.type == "history_added":
print(event.item)
elif event.type == "agent_end":
# One assistant turn finished.
break
elif event.type == "error":
print(f"Error: {event.error}")
if __name__ == "__main__":
asyncio.run(main())session.send_message() accepts either a plain string or a structured realtime message. For raw audio chunks, use [session.send_audio()][agents.realtime.session.RealtimeSession.send_audio].
- Microphone capture and speaker playback code. See the realtime examples in
examples/realtime. - SIP / telephony attach flows. See Realtime transport and the SIP section.
Once the basic session works, the settings most people reach for next are:
model_nameaudio.input.format,audio.output.formataudio.input.transcriptionaudio.input.noise_reductionaudio.input.turn_detectionfor automatic turn detectionaudio.output.voicetool_choice,prompt,tracingasync_tool_calls,guardrails_settings.debounce_text_length,tool_error_formatter
The older flat aliases such as input_audio_format, output_audio_format, input_audio_transcription, and turn_detection still work, but nested audio settings are preferred for new code.
For manual turn control, use a raw session.update / input_audio_buffer.commit / response.create flow as described in the Realtime agents guide.
For the full schema, see [RealtimeRunConfig][agents.realtime.config.RealtimeRunConfig] and [RealtimeSessionModelSettings][agents.realtime.config.RealtimeSessionModelSettings].
Set your API key in the environment:
export OPENAI_API_KEY="your-api-key-here"Or pass it directly when starting the session:
session = await runner.run(model_config={"api_key": "your-api-key"})model_config also supports:
url: Custom WebSocket endpointheaders: Custom request headerscall_id: Attach to an existing realtime call. In this repo, the documented attach flow is SIP.playback_tracker: Report how much audio the user has actually heard
If you pass headers explicitly, the SDK will not inject an Authorization header for you.
When connecting to Azure OpenAI, pass a GA Realtime endpoint URL in model_config["url"] and explicit headers. Avoid the legacy beta path (/openai/realtime?api-version=...) with realtime agents. See the Realtime agents guide for details.
- Read Realtime transport to choose between server-side WebSocket and SIP.
- Read the Realtime agents guide for lifecycle, structured input, approvals, handoffs, guardrails, and low-level control.
- Browse the examples in
examples/realtime.