Architecture
Voxtra is built on four primitives. Once you understand how they compose, the rest of the API is mechanical.
The big picture
The four primitives
1. ARIClient
The lowest layer. Wraps Asterisk’s REST Interface (HTTP) and Stasis event
stream (WebSocket). Methods are 1:1 with ARI endpoints — originate,
answer_channel, hangup_channel, create_bridge, record_channel,
reload_module, etc.
You rarely use this directly, but it’s there when you need to hit ARI features Voxtra hasn’t wrapped yet:
from voxtra import ARIClient
ari = ARIClient(base_url="http://pbx:8088", username="...", password="...")
await ari.connect()
modules = await ari.list_modules() # raw ARI call
await ari.reload_module("res_pjsip.so")2. AudioSocketServer
A TCP server that accepts AudioSocket connections from the Asterisk
AudioSocket() dialplan app. Each connection is a CallSession’s media
channel — frames in, frames out — without RTP or SRTP.
The server runs on a port you choose (default: ephemeral) and is auto-
started by VoxtraApp when audio operations are first used.
3. CallSession
The developer-facing handle. Every inbound call and every successful
originate() produces one. It’s what your @app.route() handler
receives:
@app.default()
async def handle(call): # ← call: CallSession
await call.answer()
digit = await call.listen_dtmf(timeout=5)
if digit == "1":
await call.transfer_to_queue("support")
else:
await call.bridge_with(other_session)CallSession exposes:
- Lifecycle:
answer,hangup,hold,unhold,transfer_to. - Audio:
audio_stream(),send_audio(),play_file(). - DTMF:
listen_dtmf(),send_dtmf(). - Recording:
record_start(),record_stop(). - AI shortcuts:
say(text),listen(timeout=),agent.respond(text). - Bridging:
bridge_with(other),transfer_to_queue(name).
4. VoxtraApp
The orchestrator. It:
- Owns the
ARIClientconnection and event loop. - Translates ARI events (
StasisStart,StasisEnd,ChannelDtmfReceived, …) intoVoxtraEvents. - Looks up the right handler in the
Router. - Creates a
CallSessionand runs your handler in a background task. - Optionally auto-wires a
VoicePipeline(when STT + LLM + TTS are configured) and aBackendWebhook. - Cleans up on hangup.
You construct it once, decorate handlers, and call app.run():
app = VoxtraApp(ari_url="...", ari_user="...", ari_password="...")
@app.default()
async def handle(call): ...
app.run()The provider registry
STT, TTS, LLM, VAD, telephony, and media providers self-register via decorators:
from voxtra.registry import registry
from voxtra.ai.stt.base import BaseSTT
@registry.register_stt("my-provider")
class MySTT(BaseSTT):
...This means third-party packages can ship new providers without touching
Voxtra’s core. Resolution is lazy — providers aren’t imported until
something asks for them by name. See
voxtra.registry.
Telephony adapter contract
BaseTelephonyAdapter is the seam between Voxtra and the underlying PBX.
Voxtra ships an AsteriskAdapter (via ARIClient) and a LiveKitAdapter
stub. New adapters implement ten async methods (connect, listen,
answer_call, hangup_call, transfer_call, hold_call, send_dtmf,
create_media_bridge, play_audio, disconnect) and translate the
backend’s native events into VoxtraEvents.
Switch backends in one line:
from voxtra import VoxtraApp
from voxtra.telephony.asterisk import AsteriskAdapter
app = VoxtraApp(telephony=AsteriskAdapter(...))
# or: app = VoxtraApp.with_asterisk(ari_url=..., ari_user=..., ari_password=...)Sessions, events, and the queue
Every CallSession has an asyncio.Queue for VoxtraEvents. The
framework pushes events onto it; your handler can await them via
helpers like listen(), listen_dtmf(), or audio_stream(). Events
that flow through:
CALL_STARTED,CALL_ANSWERED,CALL_ENDEDUSER_TRANSCRIPT,AGENT_RESPONSE(from the AI pipeline)DTMF_RECEIVEDMEDIA_STARTED,MEDIA_STOPPED
Full taxonomy: Events →.
What runs where
| Component | Process | Notes |
|---|---|---|
VoxtraApp | Your application | One per Stasis app namespace. |
ARIClient | Inside VoxtraApp | Single HTTP + WS connection, auto-reconnect. |
AudioSocketServer | Inside VoxtraApp | TCP server, accepts media legs. |
VoicePipeline | One per active CallSession | Background asyncio task. |
BackendWebhook | One per VoxtraApp | Owns its own httpx.AsyncClient. |
| Asterisk | Separate process | Voxtra never assumes single-host deployment. |
Production-grade defaults
Voxtra’s defaults are tuned for production traffic:
- Reconnects on ARI WS drops with configurable backoff.
- Idempotent stop signals —
agents/stopanddelete_roomno-op cleanly on already-gone resources. - SIP-aware idle detection — browser observers leaving doesn’t end a call; only a missing SIP leg does.
- HMAC-signed webhooks — receivers verify origin before acting.
- Best-effort emission — webhook and recording-sink failures never propagate into the call pipeline.